Mutation Testing with Mutant

by Erik Michaels-Ober

In the video titled "Mutation Testing with Mutant" presented by Erik Michaels-Ober at RailsConf 2014, the speaker explores the concept of mutation testing as a tool for improving code quality by verifying the effectiveness of tests. The discussion begins with a metaphor comparing programmers to carpenters, emphasizing that just as carpenters rely on the right tools, programmers need effective development tools. The key theme revolves around mutation testing, which operates by modifying code and observing whether tests appropriately fail or pass in response to those changes. Key points covered in the talk include:

Importance of Tools: The talk outlines various programming tools such as code editors, debuggers, and code profilers, which enhance programming capabilities. Specifically, he highlights how editors prevent bugs through features like syntax highlighting and auto-completion.
Testing vs. Code Quality: While traditional testing is often considered a foolproof way to ensure code quality, Michaels-Ober argues that tests themselves can be flawed. He critiques the notion that 100% code coverage guarantees bug-free code, clarifying that flawed tests can provide a false sense of security.
Challenge of Measuring Test Effectiveness: To measure the effectiveness of tests, the speaker presents mutation testing as a solution to the problem of 'who watches the watchers.' Mutation testing creates small modifications to the code (mutants) and tests whether existing tests can catch these changes.
Mutation Testing Mechanics: The speaker elaborates on how mutation testing works through examples, explaining that if a test passes after a modification (mutant) is applied, it indicates that either some test cases are missing or existing tests are not rigorous enough.
Live Coding Demonstration: In the latter part of the talk, Michaels-Ober conducts live coding to demonstrate mutation testing's implementation using the Mutant library. He showcases how to write tests for a class representing planets and discusses different strategies to ensure comprehensive test coverage via mutation testing techniques.

Ultimately, the conclusion of the talk reinforces the idea that mutation testing serves as an essential tool for developers by offering a method to improve their tests, leading to higher code quality and fewer bugs. The speaker encourages viewers to experiment with mutation testing frameworks like Mutant to enhance their testing strategies.

00:00:16.960 Okay, is the mic live? Yeah, we're good. Hi everybody, welcome, thank you for coming.

00:00:23.439 This is going to be a talk about tools.

00:00:32.000 There's a common expression that says a carpenter is only as good as his or her tools.

00:00:38.480 I'm not a carpenter, but that makes a lot of sense to me.

00:00:44.559 If your hammer is made out of feathers, you're not going to be able to build very much.

00:00:51.039 I think the same thing applies to programmers. The tools that we use really enable us to do our jobs.

00:00:57.680 We use so many tools that it's easy to take for granted the tools that we have.

00:01:04.559 I think it's worth reflecting on the tools we use and how they help us improve as programmers.

00:01:11.200 In this case, I'll be specifically talking about mutation testing and how it can help us write better tests.

00:01:18.640 To start with, I think we should take a moment to set a context for the tools we use every day.

00:01:24.000 The first tool I want to mention is the editor.

00:01:30.079 It seems like a simple tool—you just type text and it shows up on the screen.

00:01:36.960 However, it's incredibly sophisticated. If you've ever tried to write a text editor or read its source code, most text editors are millions of lines of code.

00:01:44.320 They provide features like syntax highlighting and auto-completion, which help us avoid bugs.

00:01:50.479 We'll often realize a bug in our editor before we deploy it to production.

00:01:57.200 This is an early version of Vim.

00:02:03.040 It's easy to forget what these tools used to look like, and this is how people used to write code.

00:02:09.520 These tools look more like shop tools than what we're used to using nowadays.

00:02:14.640 This is an early punch card machine; the photo was taken in the Computer History Museum in Mountain View, California.

00:02:20.319 I can tell you for a fact that I would not be a programmer today if this was how we still had to write programs.

00:02:26.400 I suspect many of you wouldn't either if this were the state of the art.

00:02:36.560 I want to make the case that both the quality and quantity of software would be much worse than it is today.

00:02:41.760 This is largely because of the continued evolution of our tools.

00:02:46.879 Another tool I use every day is an interactive debugger.

00:02:53.920 It allows you to step through your code line by line to better understand how it works.

00:03:00.800 I'm not going to spend too much time talking about debuggers, but I have a public service announcement.

00:03:06.640 Next Thursday, in this same room, I believe there is a great talk on debugger-driven development with Pry.

00:03:13.119 If you're interested in that topic, you should check it out.

00:03:20.400 So, what do we do when our code is slow? What's the tool for that?

00:03:27.600 We have profilers that tell us where time is being spent when we execute our code.

00:03:35.280 I wouldn't even know how to start optimizing a program without a profiler.

00:03:41.840 I guess I would start adding timestamps manually, which would be incredibly inefficient.

00:03:47.680 I wouldn't be as good at it, and none of us would.

00:03:53.440 Another tool prevalent in the Ruby community is testing.

00:03:57.600 This is an example of someone who should have done more testing.

00:04:04.000 Testing can save you from running into production errors.

00:04:10.080 In the Ruby toolbox, tests are like the hammer—something we turn to frequently.

00:04:15.760 We use tests to prevent regressions, specify behavior, and even drive development.

00:04:25.760 If we write tests, then we expect to have perfect code, right?

00:04:31.759 The fundamental logical flaw with testing is that while tests help verify our code, they themselves are code that can have bugs.

00:04:42.640 So, what's the solution? You should test your tests.

00:04:49.920 One tool people use to measure the effectiveness of their tests is code coverage.

00:05:01.280 Code coverage is a metric designed to tell you whether your tests do what they're supposed to.

00:05:08.160 However, it can give you a false sense of security; 100% code coverage doesn't guarantee bug-free code.

00:05:16.080 Code coverage, while helpful, is not foolproof.

00:05:22.639 Code coverage is built into Ruby, and while I'm not against it, it can give you a false sense of security.

00:05:28.640 A lot of people believe that reaching 100% code coverage means their code is perfect, which is simply not true.

00:05:38.560 So, is there hope for us? How do we test our tests?

00:05:44.319 It's a kind of paradox: who watches the watchers?

00:05:52.759 If we can't trust our tests, why are we writing them at all?

00:06:01.120 I'm going to argue that mutation testing is a solid solution to this problem.

00:06:08.399 The core idea behind mutation testing is to take your tests and run them against your code to see if they pass.

00:06:15.120 If they pass, mutation testing modifies your code at runtime and runs your tests again.

00:06:22.400 The expectation is that if your code is modified, one or more previously passing tests should now fail.

00:06:30.080 The modified version of your code is called a 'mutant,' and if a mutant survives the test, it means there's something wrong with your tests.

00:06:37.200 There may not be an issue with your code, but you definitely have a problem with your tests.

00:06:44.079 This technique helps answer the question of what tests should I write, which many beginners struggle with.

00:06:52.240 It also addresses the question of how do I know when my tests are sufficient.

00:06:59.840 Mutation testing provides a quantitative answer to those questions.

00:07:06.880 You can confidently say a certain code has 100% mutation coverage.

00:07:13.600 For example, here's some code with an assertion about the code.

00:07:19.120 I have a method called 'foo' that takes an optional argument with a default value of true.

00:07:25.839 The body of the method either returns this argument or raises an exception.

00:07:32.720 In my assertion, I am checking that if I call 'foo' without any parameters, nothing is raised.

00:07:41.279 This test will pass because calling 'foo' with the default parameter will return true.

00:07:48.720 However, this does not mean you're done writing tests.

00:07:55.279 A mutant for this code might just remove the 'or fail' statement.

00:08:01.840 This means your test should flag an error if this appears.

00:08:08.720 If it does not, then you are not testing your code sufficiently.

00:08:15.200 This type of mutation is known as a 'statement deletion mutation'.

00:08:22.960 There are several other types of mutations, like changing default parameters or altering logical conditions.

00:08:31.039 The core idea behind mutation testing is that these cases help you ensure your code is fully tested.

00:08:40.639 Mutation testing allows you to know when you've sufficiently tested your code.

00:08:47.839 Katrina Owen tweeted that if you add more granular tests, you'll find more bugs.

00:08:55.760 In many cases, Mutant, which is a mutation testing framework, will help find those bugs.

00:09:01.840 Now, I'll switch gears to some live coding.

00:09:07.760 This is the introduction; now we will write some code.

00:09:14.080 Let me quickly switch to mirror displays.

00:09:19.760 That's a pro tip; thank you!

00:09:22.639 A new version of Mutant was just released moments before this presentation.

00:09:28.320 I am not the author of Mutant, it's a great library by Marcus Sherp.

00:09:34.640 I encourage you to check it out; version 0.5.11 is hot off the presses!

00:09:41.600 So here is some code for our live coding demo.

00:09:48.800 The focus of this demo is on writing tests, not on writing code.

00:09:56.000 Mutant does not verify if your code is correct; it verifies that your tests are correct.

00:10:03.680 This code is quite simple but let's walk through it to ensure everyone understands.

00:10:09.920 There's a module that represents the universe and within that, we have planets.

00:10:16.079 This class we'll define takes a radius and an area as parameters during construction.

00:10:23.440 The radius is the planet's mean radius in kilometers, and the area is the surface area in square kilometers.

00:10:30.639 There is one public method: 'spherical,' which returns true if the planet is a perfect sphere or close enough.

00:10:37.200 To determine this, we calculate the approximate area using the formula 4πr².

00:10:43.120 If the area matches our approximation, it indicates that the planet is spherical.

00:10:50.239 If it isn't true, then the planet is not spherical, it might be oblate like Earth.

00:10:57.440 We have a private method that generates a tolerance range.

00:11:04.399 We don't want it to be too precise because we deal with pi, which is non-terminating.

00:11:10.000 We're checking if the actual area falls within these bounds.

00:11:16.880 If everyone understands this code, it's pretty simple, and I want to do a quick interactive poll.

00:11:23.920 How many tests do you think are needed to fully cover this code, particularly the spherical method?

00:11:30.879 Who thinks you need zero tests? Show of hands.

00:11:37.200 No? Good, you've been paying attention.

00:11:42.159 Who thinks you can do it with one test, maybe the happy path?

00:11:48.240 Raise your hands if you think that's sufficient.

00:11:54.639 Nobody? I'm glad you all agree.

00:11:59.440 You can achieve 100% code coverage with just one test but not 100% mutation coverage.

00:12:05.440 So let's prove this to you regarding 100% code coverage.

00:12:11.840 How many tests do you think would achieve this? Is it sufficient?

00:12:20.160 What about two tests?

00:12:26.720 Yes, someone mentioned testing the happy path and the sad path.

00:12:32.000 How about three tests? What would your third test be?

00:12:38.000 A value that would blow up the computation?

00:12:45.280 Passing in a string, expecting an exception? That's a good one.

00:12:51.760 Let's keep going. Who thinks you need four tests?

00:12:56.480 A couple of people do. What would your additional tests cover?

00:13:02.880 Testing a range on both the low end and high end sounds solid.

00:13:08.160 How about five or more? Anyone raising their hands?

00:13:14.720 According to Mutant, you can actually cover this code with just four tests.

00:13:21.600 That would be the happy path, the sad path, and both sides of the range.

00:13:27.840 Let's show how that would work.

00:13:34.480 Currently, let's make a gem file and I'm starting to write some tests.

00:13:41.600 I'm setting up a simple layout with a 'lib' directory and a 'spec' directory.

00:13:49.000 I'm going to add RSpec to these tests.

00:13:54.560 And now I am also going to add Mutant.

00:14:00.520 I'll run 'bundle install' to install our required gems.

00:14:06.560 It just installed the new version of Mutant that was released.

00:14:11.760 Now let's write some specs for our planet.

00:14:17.760 We'll require RSpec and the planet file, and we'll use relative requires.

00:14:25.359 Now let's start writing our specs; we'll describe our planet in the universe module.

00:14:31.760 We'll create a subject, which will be our planet.

00:14:38.240 The planet is initialized with a radius and an area.

00:14:43.600 Let's start with the happy path because that was the first test most preferred.

00:14:52.080 In this case, we'll define Venus as being the happy path.

00:14:58.560 The radius will be a specific value.

00:15:04.480 And the surface area of Venus is approximately 460 million square kilometers.

00:15:10.639 So we will set that to our assertion that Venus is spherical.

00:15:16.640 Now we expect our subject to be spherical.

00:15:22.880 Let's run the tests.

00:15:29.920 Cool, our tests passed!

00:15:33.920 Now let's open our gem file and add SimpleCov to measure the code coverage.

00:15:41.760 With SimpleCov, we'll check that the coverage of our tests is 100%.

00:15:48.560 Once we run our specs again, we should see a coverage report.

00:15:54.640 SimpleCov assures that every piece of code is executed; in our happy path, it indeed does.

00:16:01.120 The class is loaded, and when tested, we can see every line executed perfectly.

00:16:09.440 So although we have 100% code coverage, we all agree this is insufficient.

00:16:17.280 Let’s move on to writing more tests.

00:16:24.560 Now, let’s test a planet that is not spherical.

00:16:31.680 We are testing Earth, which is an oblate spheroid.

00:16:39.120 Its radius and area values are known.

00:16:46.240 We expect our subject not to be spherical.

00:16:52.520 Let’s run this test.

00:16:58.720 Great, the tests have passed!

00:17:05.120 Now let's see how mutation testing works with Mutant.

00:17:11.440 To do this, we will use the mutant command line.

00:17:18.640 We need to pass in our library directory and specify which library to require.

00:17:25.440 Then we will provide the test framework option for RSpec.

00:17:32.000 But first, let’s ensure we’ve added the necessary gems.

00:17:39.479 Now let's run the mutant command.

00:17:45.440 This is fascinating—though we only wrote two tests, many green dots and 'f's will appear.

00:17:53.840 That is because it’s running through multiple mutations of our code.

00:18:00.800 In fact, 83 mutations were generated, and 82 of them were killed.

00:18:05.840 The one mutant we missed provides helpful output to see what needs our attention.

00:18:12.000 We took out specific code lines, and our tests still passed.

00:18:20.200 If you haven’t put in a test that covers this, it’s a blind spot.

00:18:28.960 So let's kill that last mutant that survived.

00:18:36.080 Does anyone have an idea on how we could achieve that?

00:18:43.840 The suggestion was to pass in a zero tolerance.

00:18:50.960 Then we expected our tests to reflect that Venus should not be spherical.

00:18:55.680 Let’s run the spec again.

00:19:01.760 Great, our tests have passed!

00:19:07.400 Now let's run the mutation commands once more.

00:19:12.760 Perfect, we've killed the last mutant!