Going on a Testing Anti-Pattern Safari

Talks

Aja Hammerly

#continuous-integration

#code-quality

#refactoring

#test-driven-development

Going on a Testing Anti-Pattern Safari

by Aja Hammerly

In the presentation titled "Going on a Testing Anti-Pattern Safari," Aja Hammerly addresses the challenges faced by developers when dealing with ineffective test suites filled with common testing anti-patterns. Aja begins by establishing what makes an effective test, emphasizing two metrics: tests should be trustworthy and simple. The talk is structured around four broad categories of anti-patterns: pointless tests, wasted time and effort, false positives and negatives, and inefficient tests.

Key Points Discussed:
- Pointless Tests: Lack of tests is highlighted as a major problem. Aja stresses that tests are essential for maintaining the integrity of code, comparing it to driving too fast without confirming your route.
- Ineffective Test Running: Aja identifies not running existing tests as a significant waste, advocating for continuous integration (CI) practices to ensure tests are executed consistently.
- Testing Others' Code: It is discouraged to write tests for established libraries, since those should already be tested by their authors. Instead, Aja encourages testing the integration of libraries with provided examples.
- False Positives/Negatives: These create distrust in test suites and can stem from poor test design, such as relying on time-sensitive tests or hard-coded values. Aja recommends using methods that normalize tests across different environments.
- Order-Dependent Tests: Tests should be independent, and their order should not affect outcomes. Randomizing test execution can help uncover hidden dependencies.
- Testing Should Fail Correctly: It's crucial for tests to include meaningful assertions to avoid passing without verifying behavior.
- Inefficient Tests: Aja discusses how external dependencies, like network calls or complicated setups, can slow tests down and disrupt development. Mocking is recommended to alleviate these issues.

Significant Examples:
- Aja provides practical code examples throughout, illustrating how to avoid anti-patterns like testing Active Record functionality unnecessarily, using "assert nothing raised," and the dangers of date-dependent tests.
- A discussion about the importance of refactoring for clarity and organization in tests is also included, urging developers to adopt best practices in their test writing.

Conclusions:
Aja concludes that developers must actively work to eliminate common anti-patterns to create simple, reliable, and efficient tests. The key takeaways emphasize the value of maintaining a healthy testing culture, running all tests, and refactoring tests for clarity and performance. This presentation serves as a guide for developers looking to refine their testing strategies and ensure robust software functionality.

00:00:09.280 Hello, everyone!

00:00:20.400 I’m really nervous about this talk. When I’m nervous, I like to think about the people I know who give really awesome presentations. So today, I’m going to ask myself: what would Tenderlove do?

00:00:33.600 Tenderlove would wear a silly costume. Check! He would also show you pictures of his cat. That’s my cat, Nick. Check!

00:00:45.360 A little bit about me: my name is Aja Hammerly. I tweet very seldom at @thagomizer and I’m on GitHub as thagomizer. I blog even less often at thagomizer.com, and I work at an awesome consultancy in Seattle called Substantial. I’ve heard they’re tuning in right now to the live feed, so hi guys, thanks for supporting me!

00:01:16.640 During the last eight years or so, I’ve helped two companies increase their use of automated tests, specifically unit testing. During that time, I’ve had some truly fantastic mentors from the Seattle Ruby community, and I’ve also had a chance to mentor several people who had never really done unit testing or automated testing before. I noticed a lot of patterns: some things people did that were good, and others that came back to bite us.

00:01:40.880 So during this talk, I’m going to pass on some of that knowledge. The first thing we really need to discuss is what makes an effective test. I have two really simple metrics for whether or not a test or a test suite is effective.

00:01:58.560 The first is that a test should be trustworthy. By 'trustworthy,' I mean that if it’s passing, the code is actually great, and everything is working. You should be able to trust that things are okay. Conversely, if the test is failing, something better be wrong. You need to be able to trust your test to accurately tell you the state of your code. The second metric is that a good test suite should be simple. Simple test suites are easy to use, easy to write, easy to extend, and simple to debug when something goes wrong.

00:02:32.000 Here’s a brief overview of my talk: I’ve organized the anti-patterns into four broad categories: pointless tests, tests that are a waste of time and effort; false positives and negatives; and inefficient tests. This talk will include a lot of code examples, and I’ll go through them pretty quickly. But before I hit them, I want to point out that there’s a mix of pure Ruby and Rails code. I use TestUnit, RSpec, and MiniTest in different examples, and I’m doing this very purposefully. The anti-patterns I’m covering are framework-agnostic, and in fact, they’re language-agnostic. None of these are specific to Ruby. You can’t just say, 'Oh, I use RSpec; I’m immune,' or 'I use MiniTest; these don’t apply to me.' They apply to everybody.

00:03:04.800 You may be lucky and not have any in your codebase, but you probably have a few. So, let’s dive straight into pointless tests. The first example of that is not having tests at all. I found this great quote on Twitter: 'We’re coding too fast to write tests' is the same as 'We’re driving too fast to confirm we’re on the right road.' Tests are your safety net; they allow you to take risks and try things out.

00:03:29.040 The solution to this anti-pattern is simple: write tests. If you don’t believe you need to write tests, turn to the person next to you and ask them to tell you why, because I have more interesting things to cover. The second anti-pattern in the pointless test category is not running the tests you have. Imagine that very lazy cheetah right there. If you’re not running your tests, you’re just like that cheetah. The worst part about not running the tests you have is that they’re worse than not having them at all. You spent time writing them, but you’re getting no benefit whatsoever.

00:04:01.840 So, use continuous integration! Having a continuous integration system means you can’t forget to run your tests. In my experience, there are tons of continuous integration systems out there, and none of them are perfect, so pick one that’s good enough, hook it up, and ensure that your tests run with every check-in. This will solve that problem. But having continuous integration isn’t sufficient by itself; you also need to listen to your continuous integration build’s failing tests. Failing tests are telling you that something is wrong!

00:04:53.760 If your team culture is to ignore a red CI (continuous integration), you need to fix that culture. I’ve been there; I know it’s tough, but it is totally doable. If your team culture is 'We don’t worry about that test failing because that test just randomly fails sometimes,' that’s an untrustworthy test—fix it! If your team culture is 'That test isn’t really useful,' delete it. Failing tests are broken windows in your code; you need to fix them so the whole neighborhood doesn’t suffer.

00:05:24.960 Moving on to more interesting things, wasting your time and effort. The first example of this is testing other people’s code. I see a lot of new unit testers and people new to Ruby and Rails doing this. I did this a lot at the beginning, and I had people say, 'What are you doing? You’re wasting your time writing that test.' Here’s a great example: testing a 'find order by ID' in Rails. You create an order object, use Active Record’s find to find it, and verify that you got the one you just created back. News flash: Active Record’s been tested. You don’t need to waste your time testing it. Someone else has done that for you.

00:06:41.760 So what should you do instead? You should test what you provide. If there are examples of code in your documentation that you provide, or if you have specific methods in your library, make sure those are tested. If you’re calling into a library, test that you’re using it correctly, and also test that your code responds to output from that library correctly. There are two sides to that: calling in and handling output. To those in the audience thinking, 'Well, I have to test that library because I don’t trust it,' I say: No! If you don’t trust the library, try not using it. There’s probably an alternative on GitHub.

00:07:09.360 And if you really have to use a library that you don’t trust, go ahead and write tests against it, but submit a pull request to the library author so that others can benefit from the work you just did. Don’t make someone else do the same work you just did—be polite and share. So, don’t write that test of Active Record. Another example of wasted time and effort is using 'assert nothing raised.' I read a blog post on Monday that said this is the ninth most commonly used assertion from TestUnit. That’s kind of scary because you don’t need it.

00:07:47.680 For those who aren’t familiar, 'assert nothing raised' takes a block, and if any of the code in that block raises an error or an exception, the assertion fails. Here’s some code that uses 'assert nothing raised', and here’s what happens when it runs: we have a failure, and we get a line 9 error, which is the line where the assertion was made. There are a couple of problems with this: 1) The failure is reported on the line that contains the words 'assert nothing raised.' It’s not reported on the line inside that block that actually had the error, and 2) The failure doesn’t tell you which line in your implementation caused the error. You get exactly no information, except that somewhere in this block, something that it called failed.

00:08:41.120 So you have unnecessary code, and I like deleting unnecessary code; it makes me very, very happy. Here’s how you can leave it out: I just deleted it. That’s all you have to do. Here’s what happens when I run the test now: the code still raises, but raises an error, not a failure. Your test still doesn’t pass, but now I get two pieces of useful information on the top of the stack trace: 1) the line in my test that raised the exception and 2) the line in my implementation that raised the exception. With more stack trace, you'd get even more information, so you’d know exactly the path your code went through that caused the error and how to deal with it.

00:09:13.360 Now onto false positives and negatives; these are the worst. Every time I’ve had a company that’s been resistant to adding unit tests or trusting their unit test framework, it’s because someone in the management chain has been burned by a test suite that gives lots of false positives or false negatives. This is all about having a trustworthy test suite. I’m going to cover a couple of examples, but there are more, and if you find me at the hackfest tonight, I can show you some of the ones that had to get cut for time.

00:09:30.240 The first example is time-sensitive tests. All of the code I’m going to show you, I have seen at a previous job of mine; not Substantial! Substantial is awesome. I’ve distilled it down to something that’ll fit on a single slide, but this is mostly real code. The output of this was that this test mostly failed during our business day or mostly passed during our business day. However, if I checked in after 8 PM, this test would fail. Usually, if I checked in late at night, I wouldn’t have my full brain available because I’m not a nighttime person.

00:09:55.760 I’d spend about 10 minutes debugging the test and then think, 'I’ll deal with it in the morning,' or I’d email our team in Bulgaria who would come in earlier than me to deal with it. No matter who picked it up the next morning, it would be magically passing again because it was time-sensitive. This was only a three-line test, yet the actual test that was failing was about 15 lines of implementation behind it, so it took us almost a year to figure out what was up with that test.

00:10:26.080 Throughout that time, the test was branded as 'that test that fails sometimes.' So, you know, you check in, and that test that fails sometimes would fail, and you wouldn’t think about it. The CI would be red, and you’d check in something else, and that would fail another test, but you’d think, 'Well, I don’t have to worry about the red CI right now; the test that fails sometimes is failing.'

00:10:41.360 How do you get around cases like this? The simplest solution is to stub 'time.now.' Say, add a single line stubbing 'time.now' and setting it equal to a set date—in this case, midnight. Now this test is no longer the test that fails sometimes; it will always pass, no matter what the local time is. To make this even more fun, mix in a CI server that runs in UTC, and all sorts of joy breaks loose with time-sensitive tests.

00:11:01.600 Here’s another example from the same job: we had a marketing team that really wanted a promotion ending on a specific day, so they hardcoded that date in. When I wrote this code, it was probably 2010 or 2011, and we had about a year and a half—almost two years—where this code passed regularly. However, the 13th of December 2012 rolled around, and magically it starts failing. You’re like, 'We haven’t touched that code in months! What could be going on?' Well, there’s that hard-coded date.

00:11:25.920 How do you get around this? You use relative dates. Active Support has the 'from_now' and 'ago' methods, and they’re fantastic for this. I’ve argued with people before about why they can’t use relative dates; they say, 'Well, RSpec says we have to have this hard-coded date in the code.' Well, make that the default but give yourself the ability to pass in a different date from your test, which is exactly what I’ve done here. Now our test says that the expiration is three months from now, so we won’t have that joyous December 13th problem; it’ll always pass.

00:11:43.040 Order-dependent tests are another one of my favorite anti-patterns—no sarcasm there! Order dependency happens when your tests leave state around. By 'state,' I mean things like variables with values in them, database records, files—basically anything. Anytime your tests don’t clean up after themselves and put the system back where it was when they started, you run the risk of order dependency. Here’s a very loosely Rails-based example: we’re going to test 'create_record' and create a lion because of our Animal class. We assert that it’s not nil, and then we test 'find_record' and assert that we can find that lion created in the first test.

00:12:20.080 I’m running them in verbose mode here, so you can see the order they run. First, the 'create' test runs, and the 'find' test runs. I should point out that this is TestUnit, and that’s actually really important in this case. So, we’ve got 'create' and 'find' working. Then we’ll add in a 'delete' test. We find that lion we created in the first test, find it in the second test, and delete it. Adding a test, let's rerun everything. The 'create' test passes, the 'delete' test passes, but now the 'find' test—which was just passing—now fails.

00:12:57.760 Interesting fact about TestUnit that I didn’t know for the first couple of years I wrote Ruby: it takes all the tests in a file and orders them by name, then runs them in that order. Our 'delete' test will run before our 'find' test even though the 'delete' test is further down in the file. This is a simple example because there are only three tests. I spent 90 minutes once debugging this in a file that had 75 tests. It was fun!

00:13:29.440 This is why this is a perfect example of order dependency, and you can’t actually fix order dependency very easily if it’s baked into your code. However, you can find it easily by randomizing your tests. This means your tests will run in a different order every time. If your tests start failing, you’ll know you have some hidden order dependency. Here are the default run orders from some of the standard Ruby test frameworks: TestUnit runs tests in alphabetical order; RSpec runs them in the order in the file; and MiniTest, by default, runs your tests randomly. One of the reasons I love MiniTest is that it randomly runs your tests.

00:14:03.199 Here’s how you can randomize for those same test frameworks: in TestUnit, versions 1.9 and 2.0, you can add 'def self.test_order :random' to randomize your tests. RSpec gives you '--order rand' and you don’t have to do anything special in MiniTest. A pro tip based on an experience I had a couple of months ago: if you’re running your tests in random order and you’re new to this, record the seed you’re using. If someone closes their browser window or console window, they might mistakenly think their environment is messed up.

00:14:52.240 It’ll take a while to come back to an ordering with a random seed that gives you the same order. Also, my experience with Guard, if you’re using it, is that it uses the same random seed for each session, meaning you’re not actually getting a lot of randomization there. So remember your seed; all of these frameworks will let you rerun in the same order they just ran, allowing you to diagnose and fix any order dependency issues.

00:15:36.000 Now, 'tests that can't fail' is another common anti-pattern I see in folks that are new to unit testing. It also happens to me when I’m really tired, and I forget to put my assertions in my tests. Tests generally need assertions, so I run into this one often. I really heavily stub and mock because I’m basically asserting that the code I just wrote, or the code I’m about to write, is the code I just wrote. I'm not actually asserting what happens or if that’s what I want.

00:16:06.560 Let’s look at a really simple example: a basic user class in Active Record style. It has a name—that’s it. I think everyone’s probably familiar with Rails, but if you’re not, ActiveRecord provides the '.all' method that returns all the records in the table. Let’s say I’m going to implement a feature so that '.all' returns records sorted by name, as opposed to being sorted by primary key, which is the default. So I write this test: I create a user named Ben, I create a user named Nick, and I assert that 'User.all' should equal Ben and Nick in that order.

00:16:46.640 I want them to be in alphabetical order, and I’m trying to be a good TDD person right now. When I run the test, it passes! So I’ve just written a test that fails without my implementation being written. Not a good thing—this is something a lot of people stumble into on accident on a fairly regular basis.

00:17:10.640 To fix this, first of all, make sure you always watch your tests go green, then red, then green again—red is important! To fix this, all I have to do is swap the creation order so that I’m not creating them in alphabetical order. Now, if I run this, the test will fail, and then I can go write my implementation and make it pass. I know that that test is now validating something useful.

00:17:52.320 Moving on to my last category, which is the code heaviest part: inefficient tests. I’ve heard a lot of hallway conversations regarding slow tests, but inefficient tests is actually a more useful category to talk about. Let’s begin with requiring external resources. One common example is using the internet. Maybe you want to call out to your authentication service to verify code that uses your authentication service, or call out to a payment service in your tests.

00:18:27.680 I’ve seen people write tests that use files they don’t check into source control. Everything necessary to run your test should be in source control next to your tests. Finally, requiring extra setup beyond the standard 'rake test' or 'rake spec' is another form of requiring external resources. You might require someone to read your README and follow extra steps as opposed to just including them in your rake file.

00:19:06.480 To demonstrate this, I’m going to focus on the internet case. Here’s a web service called Clever. I accessed a web service called Clever, which provides roster information about school districts. The implementation simply calls their districts API with their demo key, which returns information about their demo district. I'm using cURL because I want something simple.

00:20:01.280 Here’s a basic test. I instantiate a Clever object and assert that when I call the 'districts' method, I get 'demo' somewhere in the results. It’s pretty straightforward, but when I run this test, it takes a quarter of a second. I don’t know about you, but I don’t want to wait that much time for a single test. If the external service goes down for maintenance or they change their API, my tests will fail, meaning I'm failing due to something someone else is doing.

00:20:36.640 This gets dangerously close to the testing other people's code anti-pattern we just discussed. Additionally, if you’re at a conference with bad Wi-Fi, on an airplane, or at a coffee shop, your tests will either fail or run very slowly while the network times out. So the obvious answer is to mock and stub external resources. Someone even mentioned WebMock this morning, and I like it! I’ve included it in a project where I was working with testing newbies to prevent them from accessing the internet.

00:21:26.560 However, I think if you have a specific use case, there are ways you can do this using the power of Ruby without pulling in an outside gem. We’re going to extract the part of our code that accesses the internet so that we can stub or mock our access. We’ll move the cURL line into a class called Requestor. Requestor has one method, 'make_request,' which takes a URL, and since I need an authentication key for this API, I’m going to throw that in.

00:21:55.840 Then I’ll change my implementation to call 'make_request.' Fairly simple refactoring—now, I’ll rerun the tests and they still pass! So we know the refactoring worked. Now comes the fun part where we get into stubbing and mocking. We’re going to make a 'StubRequester' class that inherits from Requestor. Our StubRequester is going to have a collection of canned responses. You can put these in a file, in a database, but I’m going to put them in a global variable because my slides are limited.

00:22:38.720 What we’re going to do is use the URL to look up our canned response and return it with a simple hash lookup. We need to change our implementation slightly so we can change the requester, passing it into the initializer. When I’ve done this in the past, we've typically had code already attached to this implementation. You don’t want to change that code or have those consuming it know about the two requesters, so make sure the default hits the internet. After making that change, we rerun our tests—great! They pass again.

00:23:32.240 Next, we adjust our tests to use the StubRequester. All we have to do is pass in that argument with the new class we just built. When we rerun our tests now, everything blows up because we don’t have a canned response for that URL yet. Thus far, we haven’t done anything that WebMock doesn’t do for you out of the box, and this error message is a lot uglier than what WebMock gives you if you try to hit the internet.

00:24:11.600 Now here’s where it gets awesome! When I’ve used WebMock in the past, one of the challenges I’ve had is that I don’t know I need it to return a specific response, or what that response should look like. Let’s transform our StubRequester class to simplify this. We’ll add a simple check: if we have a canned response, return it; otherwise, access the internet. We already have code that accesses the internet by inheriting from the requestor class with that code, so let’s just call out to super.

00:24:57.040 The problem is: based on past experiences, I know that if I just call out to super in that case, people aren’t going to put the stubs in the way they’re supposed to. Let’s make sure that raises an error so the test still fails but provides feedback with what you would get if you hit the internet. Since I need the URL to make the canned response, let’s include that in the raise exception message. Now, let’s see how this works when we run the test. We get an error—expected because we didn’t actually have a canned response—but the runtime error we get starts with the URL we tried to hit.

00:25:42.080 It includes the response (I have truncated the JSON output for length), as well as a stack trace, but we can safely ignore that in this case. Now, I have all the information I need to set up a canned response in my StubRequester class. I simply copy and paste it, putting the URL and the expected response for the test into my StubRequester. Now, we rerun our tests, and we’re down to less than a thousandth of a second! For comparison, we started at a quarter of a second. It’s significantly faster, it doesn’t hit the internet, and it validates our responses. Plus, you have an example of what you expect the external web service to return.

00:26:35.680 I’ve used these files in negotiations with web services when they denied having changed their APIs, saying, 'No, you did! Here’s the exact information with all the details and the dates when I collected this from you!' Now, on to another anti-pattern that really bugs me: complex setup. I did a bunch of research for this talk and also heard this called 'requires the world.' You may have this anti-pattern in your codebase if testing a single model or a single controller requires instantiating all the other fixtures in your Rails app.

00:27:07.200 Or when testing a single class requires instantiating six or more other classes. Considering who you ask, it might be two or more, four or more—more accurately, six is a lot! Sadly, this isn’t just a testing anti-pattern; it’s an implementation anti-pattern. The solution is to refactor your implementation, which can sometimes be exceedingly difficult. It might be so hard that I won’t go into it during this talk. Instead, I highly recommend the book 'Working Effectively with Legacy Code' by Michael Feathers. The vast majority of this book discusses testing large, tangled apps so that you can incrementally refactor them and make them better.

00:27:57.680 Onward to messy tests! You may have messy tests if you see lots of repeated code. If you find yourself writing tests via the copy-paste-and-tweak method of test authorship, your tests are likely disorganized. This is a good indication of disorganization: you find the same test in two different files, or worst case, you find the same test in two different places within the same file—sometimes even with the same name. Your tests can also be disorganized if you have literals everywhere.

00:28:29.760 Much like coding, the solution to messy tests is to refactor. Here are some tips for test refactoring: 'DRY' (Don’t Repeat Yourself) isn’t just for your implementation code; it applies to your tests, too. Take repeated code blocks and create methods for them, giving those methods descriptive names. Group your tests by the method under test. RSpec makes this implicit with describing context blocks. You can do the same in TestUnit by commenting with the method name and listing all tests for that method underneath.

00:29:07.760 Use descriptive test names! We are no longer in the punch card days; we aren’t charged by characters. If you need a 50-character test name, go for it. Make sure that special preconditions are represented in the test name. Indicate whether you are testing a valid or error case. Finally, put your literals into variables. I’m really bad at this, and I’m getting help at my current job in remembering to put literals into variables so that if you need to change them, you only change them once.

00:30:00.000 Lastly, the anti-pattern that everyone’s been waiting for, even if they didn’t know it: slow tests! Everyone has a different definition of 'slow.' While I’m not going to give you mine, common reasons include having other testing anti-patterns lurking in your codebase. If you’re doing anything I’ve covered so far, there’s a good chance your tests will be slow.

00:30:29.679 Accessing external resources is a big one. Having tests that involve a lot of setup—who here has a test that imports an entire database from a dump before running? I’ve worked on projects like that, and those tests were slow! Don’t do it! The main reason for slow tests isn’t because there’s something wrong with the test framework, but because the implementation is slow.

00:30:57.679 Here’s an example of two tests: one that’s fast because it only asserts true, and one that’s slow because it calls 'myclass.slow_method'. The slow method may—or may not—sleep for five seconds before asserting true. The first thing you should do when confronted with slow tests is fix your implementation.

00:31:13.679 Speeding up your feedback cycle for developers will benefit your users; it will make everyone happier! But perhaps your company doesn’t fully appreciate the benefits of testing yet, or maybe this method is only run once every six months in production. You can’t justify spending the three weeks required to make it faster right now.

00:31:31.680 In that case, you have one more tool up your sleeve: guards! Here are those two tests again. Check for an environment variable; if it’s not set, skip the slow test. When you run your tests normally, you only run the fast test—therefore they run really quickly! Add the environment variable to your call, and now your tests will run slowly, but you’ll still be able to run both.

00:31:46.720 This is a powerful pattern! I showed someone this, and they implemented it everywhere. We stopped running all those tests, even in CI. Don’t do that! If you’re going to use this method, reserve it for developers who don’t need to care about slow tests.

00:31:51.600 Make sure your CI runs these tests often, at least once a day or, preferably, with every check-in.

00:31:58.240 So, my top three: not running your tests—if you have them, run them! Benefit from the work you’ve already done. Second: order-dependent tests! I think this is a really important anti-pattern for people to understand because it pops up at the most odd times, and it's hard to debug.

00:32:11.600 Finally, requiring external resources is a considerable pet peeve of mine because I like to write code at coffee shops, on airplanes, and in places that don’t require external resources.

00:32:25.040 Thank you all! And thanks to Creative Commons for all these wonderful photos I managed to use for today's talk.

MountainWest RubyConf 2013