Magic Tricks of Testing

Tests are supposed to save us money. How is it, then, that many times they become millstones around our necks, gradually morphing into fragile, breakable things that raise the cost of change?

We write too many tests and we test the wrong kinds of things. This talk strips away the veil and offers simple, practical guidelines for choosing what to test and how to test it. Finding the right testing balance isn't magic, it's a magic trick; come and learn the secret of writing stable tests that protect your application at the lowest possible cost.

Ancient City Ruby 2013

00:00:00.320 So we are the people who embrace testing. We think like we act a little bit like the chosen ones, right? We're the clan that drank the Kool-Aid. We say that and yet the truth is you don't have to look very far to see that it's not as nice as we let on. As a matter of fact, there are many perfectly legitimate, completely respectable card-carrying members of our community for whom this is a true statement: "I hate my tests." There are many people for whom this is true. I'll bet some of you are here. I hate my tests. Why is it that you hate your tests? You hate them because they are slow.

00:00:22.880 They kill your productivity. Off you go to Twitter, and who knows when you'll be back? Slow tests kill us; we hate them because they get in the way of getting things done. They're fragile; they break every time you make a change. The app still works, but ah, here I promised to unplug that. The app still works, and yet all the tests are broken, and you have to go away and fix them all before you can get back to getting real work done. It seems like such a waste of time. If you're having this experience, it's rational to begin to wonder if tests are worth it. Everyone says you should have them, and maybe they save you just often enough so that you're afraid to delete them all. But the cold hard truth is they make you miserable. If this is where you are right now, you're not alone. For many people, the promise of testing has not been fulfilled.

00:01:14.479 How many talks have you seen so far in a one-track, two-day conference that talk in some way about testing? We're spending a lot of time and energy still talking about this issue. For most of us, this promise has failed. Now, fortunately, it doesn't have to be this way. You can have cost-effective tests. Good tests aren't magic; they're magic tricks, and tricks you can learn. They are simple, and there are just a few of them, and I can show you in the next 28 minutes.

00:01:50.320 For most of you, they'll involve some form of Ben's favorite thing: the leading code. It's even more fun when you get to delete tests. I had a conversation with Avd Grim, who read the book 'Practical Object-Oriented Design in Ruby' before it was published to help me remove all my bone-headed Ruby mistakes. He was incredibly kind about that. I cannot tell you how much I appreciated it. When I made a statement in the book that said most people have too many tests, he wrote me back an email that said, 'When I read that, I laughed out loud.' He said they don't have too many tests; they have too few. What I realized is that that's probably true, but you have too few because at first you had too many, and they were so bad that they got in your way so much that you quit doing it and walked away from testing altogether.

00:02:18.800 So today we're going to talk about unit tests, not integration tests, and I'm going to give you some strategies for doing unit testing. These strategies require that you have integration tests, so they're going to depend on global end-to-end testing to prove the correctness of your app. Here's what we want out of our unit tests: we want them to be thorough; we want them to be a logical and complete proof of the correctness of the object under test. We want them to be stable; we don't want them to break every time we change something about the code. Even if the code is still correct, we want the tests to pass. We want them to be fast; we don't want to do that zombies and molasses thing where we get sluggish while our tests are running. Finally, we want them to be few; we want them to be the most parsimonious expression of these proofs. We don't want extra code to maintain.

00:03:28.480 Achieving this takes a kind of clarity of vision about your app. So do a thought experiment for me. Think about an app that you're working on right now. Imagine it. Close your eyes if you have to. Think about the running app in memory, the objects, and the messages they pass between them. So get a mental image of that. For many of us, it looks like this: our applications grow up, and they turn into these incredibly tangled thickets of code, and we don't understand them. If we don't understand our apps, it's no wonder we can't write tests. The way out of this mess is to follow the message.

00:04:16.160 These testing magic tricks focus on messages, and fortunately, the objects that we're going to test have a very simple understanding of messages. Objects are like black boxes; they're like space capsules. Space capsules have an inside and an outside, and they are very motivated to keep these two things apart. The cold, dark, lonely outside is incredibly dangerous, and it's a matter of life and death to keep the outside separate from the inside. So, from the point of view of the object under test, there are only a few things it knows about messages: it knows their origins, and it knows what I'm going to call their types. I'll explain both of those things.

00:05:05.919 Messages, as far as origins go, are received from others; this is what the object under test knows—their origins are incoming. Incoming messages blast holes in the containment wall of the space capsule, and you have to be very careful about letting the outside see. Objects under test also send messages to others; these outgoing messages also make holes in the containment wall, allowing you, the inside, to see out. They also send messages to themselves. This is the third place from which messages originate. So, from the point of view of the spacecraft, from the point of view of the object under test, we have these incoming, self-generated, and outgoing messages.

00:06:00.000 Messages also have types; for lack of a better term, I call it type. Messages are either query messages or command messages. Command messages have side effects, whereas queries do not. That's the long and short of it. Now, think about that for a minute. A query message is like asking for a calculation; I'll get an answer back, and it doesn't change anything—there's no visible side effect. A command message is like storing something in the database. I'm not necessarily depending on the result, but it does cause a side effect that is visible to other objects in the system. It would be a perfect world if we always had command or query messages.

00:06:59.599 Everybody here has probably had the experience where you're going to work on a code base you don't know very well, and you find a message that gives you an answer you want, and then you send that message over and over again. Later, someone comes around and says, 'Wow, there are a thousand duplicate rows in the database.' It's a command message, but it was secret—you didn't know it had a side effect. It’s not evil to conflate command and query; we do it all the time. Popping something off a queue is a perfect example; I send a pop, I get the thing back, and it changes the queue. There’s a side effect. It's nice to avoid conflating them if you can, but a lot of times, it makes sense, and that's okay. It's just that we're going to test them in different ways, and it's important to know whether you have a command or a query or both and which part you're testing.

00:08:28.600 So we have these three origins and two types, which gives us a grid to work with. We're filling out this grid. I'm going to show you a bunch of rules and give you made-up code to illustrate those rules. Katrina Owen, whom you may know from the Ruby Rogues, has taken repositories off of GitHub with massively complicated test suites and applied the rules I'll talk to you about today to those real-life examples. So look for her talk later this year; it's actually kind of part two of this one.

00:09:07.760 Part one—we're going to look at incoming query messages. The first half of the incoming messages is the top left cell. I have this wheel class that implements a diameter. It's really simple; it's a bicycle wheel. Here’s the implementation, although I don't care; it's kind of crappy, and it doesn't really matter. Here's the test. I'm using MiniTest; it doesn't matter if you don't use MiniTest. I'm using it in its least magical form where I just stuff the methods. The only thing you might not be aware of is that I'm using Jim Weirich's 'about' method; this returns a float saying it’s going to be 29 within that close.

00:10:40.799 This brings us to the first rule: test an incoming query message by making assertions about what they send back. You might call that state. I find the word state very confusing, and it has come to my attention that many other people do too. So this is what you get back. It's going to return something that is a complete and total proof that an incoming query message is correct. Before we go on, I'm going to show you one more incoming query message. Gear has a gear inches method; if you're a cyclist, you know what that is; if you're not, you don’t care. Here's the implementation of that method.

00:11:21.680 Notice it sends a private method, sending a private message to itself, and it sends some other message to some other object. The implementation here is a good deal more complicated, but it's still a query message. It returns a result, and it's the result that matters. How can we prove that this method is correct? The way to think about this is to imagine sighting along the edges of the space capsule. If you stand right there and look, all you can see is the messages and parameters as they go in and the answer as it comes back out.

00:11:58.960 Because we're sighting along the edge of the space capsule, this looks a lot like the test we already saw. Its internal implementation is considerably more complicated, but we don't care because we can't see it. We want to test the interface, not the implementation. When I test only the interface, it means that I can change the implementation without breaking the test. I can refactor the internals of the method as much as I want, and the test will always be safe. The test will always prove the correctness of this method. This is it for incoming query messages. We want to make assertions about what they return, and we want to test the interface, not the implementation.

00:12:48.400 Now let's move on to incoming command messages, the second box in that row. Here's the simplest command message I could construct: I have this set_cog method that takes an argument. 'Use call set_cogs' as the argument, and it sets cog to something. I have a cog reader; this message is both a command and a query. It has a side effect that other people can see by sending the cog message, and it returns the value that it got passed so that you can chain things together. It's really common in Ruby for us to do this—like I would never tell you not to do it. I like that method; you can't do what Andy said unless you return something and also have a side effect. It's okay to do that; you just have to know which part you're testing.

00:13:49.200 So I'm going to test the command part of this, and it's going to be a dirt-simple test. I'm going to create a new object, then send the message, and finally make an assertion about the side effect. The simple rule here is: we're going to test command messages by making assertions about direct public side effects. Now, I'm a little worried about side effects. We know that the meaning of public means other people can see it, while direct makes sense easily if it's a state change in you, like I changed the value of cog—that's direct. If I call out and write to the file system, well, that's kind of sort of not really direct. So this rule makes a lot of sense to me; I want to be making assertions here, but the definition of direct is a little bit situational. Judgment will be called for here.

00:14:58.480 Notice that the top row—incoming— is all about assertions. I'm going to make assertions about what comes back for incoming query messages, and I'm going to make assertions about the side effect for command messages. The bottom line is that the receiver of the incoming message is responsible for all the assertions. This is the kind of testing that you already do right now; you probably do this kind of testing for everything. But I’m going to tell you that this is the only place where you should make assertions. There's no other place in your tests where it makes sense to make assertions. Now let's go look at messages sent to self: the ratio—a private method we saw before— that gear inches sends.

00:16:03.840 How should I test this method? Well, let’s look. Here’s an example: I could create a test for that method to make an assertion about what it got back. Think of this from the perspective of sighting along the edges of the space capsule. No other object knows this message exists. The fact that the gear inches test runs and returns the correct result is sufficient proof; you already have a public test for the method in the public interface. This does not add any safety; if that other test works, ratio must be okay. This test is redundant. While redundancy might be fine, there might be some cases where you want to keep redundancy, but your tests aren't one of them. Do not test private messages by making assertions about what they return.

00:17:14.240 Here’s another example: people say, 'Okay, I know I’m not supposed to test it.' I understand that I’m not supposed to test the private method, but ratio sends it. Gear inches sends it and guarantees—surely it will not be correct unless that message gets sent. So they take this perfectly good test, this test that completely works, and they change it to assert that it gets sent. What does this prove? From outside of the space capsule, you cannot see this is happening. This test is an over-specification; it adds no safety and incurs costs without any benefit. When you set this, it binds you to the current implementation because a test that insists that you continue to send a basically invisible message creates a world where you can't improve the code without breaking the test. This expectation does not make it safe to refactor; it makes it impossible to refactor.

00:18:25.760 The rules for private methods are simple: don't test them; don’t make assertions about what they give back; don’t set expectations that you will send them. I break this rule all the time. I just want to save money—that’s all I want to do. If I'm writing a complicated internal private algorithm, I often want to use TDD to get it working and so I have tests. If I'm working on a complicated private algorithm, it can be very handy to get an error message that's close to the offending line of code when I break it. Those things are handy early on in development. If you leave them, they can start getting in the way of refactorings. Sometimes I write that code and then I compound my sins by putting a comment in the test that says, 'Delete these if they fail.'

00:19:24.560 There’s a tipping point: while they're adding value to let you get the code running, I keep them; I don’t care. But as soon as you start wanting to refactor and they get in the way, you have the API tests that prove the correctness of the methods—delete those tests. Don’t try to maintain them; they cost a lot of money. Someone was telling me today they couldn't talk someone out of testing private methods, and I'm like, 'It is okay to get them running by testing them, but they will get in your way.' They cost you money later, and you should not fix them later; you should delete them. So there we go: the general rules; ignore them. The more nuanced rule is: do what saves you money.

00:20:20.960 Part three is going to be about outgoing messages. The first half of that is outgoing query messages, and what you're going to notice here is that you're about to have a sense of déjà vu because the outgoing query message rules are going to be just like the rules for the messages sent to self. We already looked at wheels’ diameter and gear inches. These are both incoming query messages to the object where we tested them. You notice that that’s what's going on; the gear inches method sends diameter to wheel. This is always true: what is incoming to wheel is outgoing from gear. They come in pairs, and so how should we test this? Well again, the better question is how not to test it.

00:21:10.560 Here’s the first anti-pattern: I have my perfectly good gear test, which completely proves that this works. I could make an assertion that I know that the gear guarantees won't be right unless wheel diameter returns the right thing. So people are tempted to go in here with an assertion on this side. This is a duplication—an exact duplication of the test that already exists over in wheel. If you do this here, when the diameter implementation in wheel changes, you have to change it over there. You also have to find all the places where you send it as outgoing messages and change them too. And as I said before, there are times when you might want to keep redundancy or not; there are times the redundancy can be handy, but it is not the right place to use them in your tests.

00:22:12.560 So instead of making assertions about the result that comes back, sometimes people do this instead. This is the other example that you've already seen from messages sent to self. They say, 'Well, I know I’m sending it; I mean, I guess I should assert that it gets sent.' This is over-specification again; it adds costs and gives you no benefit. It binds you to the current implementation. It gives you no proof of correctness in return. The rules for outgoing query messages are just like the rules for messages sent to self: don't test them, don't make assertions about them, and don't set expectations on them. Just like outgoing query messages, they don't have visible side effects. These are messages that will not harm your app if they don’t get sent. If it doesn't matter that a message gets sent, you shouldn't test the sender of that message.

00:23:12.280 So that leaves us with outgoing command messages. I have to set up a new example to do this. I'm going to create a little object called 'obs,' because 'observer' is now a reserved word. It turns out we have a game where people ride bikes, and if they change gears, I have to notify the app. The new requirement here is: if gear receives set_cog, it has to send change to the observer. That’s the new requirement, so I'm going to go and fix the code. I'm going to roll my own observer just to keep it straightforward, and I'm going to change set_cog to send the change message.

00:23:50.080 Here's a picture of what just happened: the new observer class that implements change has some side effect. I don't care what it is for today; it could be writing to a database or sending email. It does something. The side effect could be something it directly does or something far removed from it. Arrival at change causes something to happen that other objects can see. Change is an incoming command message, and it’s going to get tested like you'd test incoming command messages by making assertions about direct public side effects. For gear, this is an outgoing command message. I just violated all the rules of TDD because I changed that code before I fixed the existing test.

00:27:03.680 Now if you run the test, it says nil doesn’t understand change, right? Because I didn't inject an observer, I'll just do that. That makes my existing test run. I didn’t change the new functionality; this is my old test for the side effect where the cog got changed. Notice that set_cog now runs slower than it used to because it sends change, and that goes to a real observer. The side effect of whatever that message does could be as costly as anything you want, and this picture is what it looks like. We could instead create a fake and put the fake in there.

00:27:49.440 Now, we have good reason to fear the fragility of mocks and stubs, and I’m going to talk about that in a minute. For right now, let's just agree on this one thing: if I put the real thing in there, it's going to break or fail correctly. It's going to mirror production, but it will run all the side effects. If I put a fake in there, I can sidestep those side effects, and my test will run more quickly, but I have to ensure the fake stays in sync with the real thing. That’s the bargain we make when we do this. What we haven’t tested is the outgoing command messages that we want to test the app, which will not be correct unless we do this. Very often, what happens here is that people look at the side effect to do this test.

00:28:37.440 Now I'm just going to take some words off that and put these lines on here, these lines, these edges. Okay, I feel so strongly about this, I'm going to come over here and touch it. Instead of just doing this, these lines are real; they represent APIs. These edges between objects are real things, and you can and should test them. As far as gear is concerned, the changed message is on this edge. Thinking about these things as having edges lets you test at the appropriate place. The other thing is that this side effect is represented in the code, but let’s imagine that this is a subclass of ActiveRecord and that the side effect is an insert into the database.

00:29:22.640 We get used to thinking that our subclasses of ActiveRecord write to the database. Look at a stack trace; it's a long way. This is where this picture goes: it's a long way to the side effect. Many objects and messages are involved between you and that distant side effect. Here’s the anti-pattern: when you test a command message by making assertions about a distant side effect, you create a dependency between yourself and every object and message between you and that distant side effect. Changing the database isn't gear's responsibility; it shouldn't even know that's happening.

00:29:57.440 Reaching across a bunch of intermediate objects and testing a distant side effect is an integration test, and this is an integration test just hiding in your unit test directory. Gear's job is to know the thing, sorry, I skipped a slide. Gear's job is to receive statistics from the change message. Testing that the message gets sent, you create a mock. You inject the mock, set an expectation on the mock, send the message, and in many tests—unlike RSpec—it’s like you get automatic verification. In RSpec and MiniTest, you have to explicitly send verify to get the given. That’s the then step, Jim's then step; you have to explicitly do it.

00:30:45.040 This test doesn't mean that all these objects and messages get sent to that distant side effect. This test depends on the interface; it depends on the message. It tests the thing for which gear is responsible at that edge, and when you test this way, you get fast and stable for free. It doesn’t run all the side effects and doesn’t break every time you change what the distant side effect is. So here’s the rule for outgoing command messages: expect to send them. However, there are caveats here too; of course, every rule has caveats.

00:31:37.040 If the side effect is created by a really cheap object, it’s very close to you, and it’s really inexpensive—just use it. I don't care, but the further it gets from you and the more complicated the path between it and you becomes, and the more costly that side effect is—the more money you're going to save by understanding this edge and mocking at it. Now we’re done filling out this chart, and what you’ve seen is that the minimalist unit tests are the minimum set necessary to prove the correctness of your apps. You may have noticed that it uses mocks.

00:32:25.600 I can hear you thinking this: we know that mocks and stubs are fragile. This is the best challenge you can make right now: what happens if the name of this message changes and I'm mocking it? At that point, my app is broken, and my tests run green, and I don't know it. So I'm going to explain how to avoid that problem in a minute, but first, I want to talk just a tiny bit about mocks and stubs. Ben talked about this a little in his talk. Stubs define context. What does that mean? You have an object under test, and it has a set of collaborators.

00:33:26.479 This is the perfect world for an object under test—a set of collaborating objects. Each of those collaborators you deal with by a known API. You're trying to test a certain case in the object under test, and what you need to do is set up a set of collaborators that are in a certain state so that a single test will run—that's what stubs are for. You can stub as needed, but if you have ten zillion stubs, you know that something is wrong. This is a smell; it might mean you have too many collaborators. If you have stubs that go to stubs that go to stubs, that’s a really bad sign. So smell a smell? Maybe you don’t like it—do this and then listen to that other smell and fix it as a separate issue.

00:34:23.360 You should avoid stubbing in the object under test. Again, this is a rule you sometimes break, especially when you're dealing with other people's code. But there’s a really good reason for this: don’t make assertions about stubs. If you assert on a stub, you're asserting that your stub has the value you gave it—that's what you're testing. You don’t have to look far to see code that asserts on stubs—right? The reason you shouldn't stub in the object under test is that value is going to leak into whatever assertion you eventually make, and you'll find that you're testing your tests, not your code.

00:34:56.320 So we want to use stubs to define context so we can run a test. Mocks are a different beast. Mocks test behavior. I told you today to set expectations on mocks to test outgoing command messages. One expectation per mock. The reason you want one expectation per test is for the same reason they tell you to do one assertion per test. An expectation is the exact same test as an assertion. It’s the thing you’re testing, so you should not stub using a mock. We see this all the time: you can set up mocks to return values. We use them for stubs because what we really want is that value back.

00:35:43.040 So if we come back here, if that change happens, we know what happens. I got on the bandwagon and mocked everything a couple of years ago, and it just killed me. You create that alternative universe where your tests all run, but the app is broken. We know that problem. The rule here is: honor the contract. If you make a test double that purports to play a role in the app, you have to ensure it doesn't drift away from that role. It turns out that there are automatic ways to do this.

00:34:56.320 You probably don't know this because I doubt many of you use MiniTest. That stub you saw me use earlier—I cannot create that stub unless the method exists in the object under test. Think about that for a minute. That stub can never drift from the standard; I do not have to worry that it’s going to get obsolete without me knowing it. There are a couple of libraries that work for MiniTest and RSpec, and they do the same thing for mocks. You know, with a little thought, you can guess that there’s some ceremony you have to set up. You have to do basically the Java version of declaring some kind of interface and using it on your test side to make sure you don’t drift away from it.

00:35:50.760 The cost of that is the price you pay for having a minimal set of unit tests. So there you go—32 minutes of tricks to turn this into this. This is what we want: thorough, stable, fast, and few. Achieving this means you have to test everything once, and you have to test the interface without getting stuck in the implementation. You have to use mocks and stubs carefully and you have to know what you're testing. You know these rules, and you know when you can break them. Above all, it requires that you insist upon simplicity. In programming, just like in everything else, when you work on things, they get more complicated.

00:36:33.040 If you quit before you reach the end of that path, you just have this insanely complicated mess. Faith that simplicity is possible will motivate you to strive until you can reach it. And you can get there with practice; you're not going to be perfect at this the first time you try, but if you persist, you can write tests that are the wall at your back instead of the millstone around your neck. You can go home every day saying, 'I love my tests.' Alright, thanks.

00:37:00.000 You.