00:00:16.960
Okay, is the mic live? Yeah, we're good. Hi everybody, welcome, thank you for coming.
00:00:23.439
This is going to be a talk about tools.
00:00:32.000
There's a common expression that says a carpenter is only as good as his or her tools.
00:00:38.480
I'm not a carpenter, but that makes a lot of sense to me.
00:00:44.559
If your hammer is made out of feathers, you're not going to be able to build very much.
00:00:51.039
I think the same thing applies to programmers. The tools that we use really enable us to do our jobs.
00:00:57.680
We use so many tools that it's easy to take for granted the tools that we have.
00:01:04.559
I think it's worth reflecting on the tools we use and how they help us improve as programmers.
00:01:11.200
In this case, I'll be specifically talking about mutation testing and how it can help us write better tests.
00:01:18.640
To start with, I think we should take a moment to set a context for the tools we use every day.
00:01:24.000
The first tool I want to mention is the editor.
00:01:30.079
It seems like a simple tool—you just type text and it shows up on the screen.
00:01:36.960
However, it's incredibly sophisticated. If you've ever tried to write a text editor or read its source code, most text editors are millions of lines of code.
00:01:44.320
They provide features like syntax highlighting and auto-completion, which help us avoid bugs.
00:01:50.479
We'll often realize a bug in our editor before we deploy it to production.
00:01:57.200
This is an early version of Vim.
00:02:03.040
It's easy to forget what these tools used to look like, and this is how people used to write code.
00:02:09.520
These tools look more like shop tools than what we're used to using nowadays.
00:02:14.640
This is an early punch card machine; the photo was taken in the Computer History Museum in Mountain View, California.
00:02:20.319
I can tell you for a fact that I would not be a programmer today if this was how we still had to write programs.
00:02:26.400
I suspect many of you wouldn't either if this were the state of the art.
00:02:36.560
I want to make the case that both the quality and quantity of software would be much worse than it is today.
00:02:41.760
This is largely because of the continued evolution of our tools.
00:02:46.879
Another tool I use every day is an interactive debugger.
00:02:53.920
It allows you to step through your code line by line to better understand how it works.
00:03:00.800
I'm not going to spend too much time talking about debuggers, but I have a public service announcement.
00:03:06.640
Next Thursday, in this same room, I believe there is a great talk on debugger-driven development with Pry.
00:03:13.119
If you're interested in that topic, you should check it out.
00:03:20.400
So, what do we do when our code is slow? What's the tool for that?
00:03:27.600
We have profilers that tell us where time is being spent when we execute our code.
00:03:35.280
I wouldn't even know how to start optimizing a program without a profiler.
00:03:41.840
I guess I would start adding timestamps manually, which would be incredibly inefficient.
00:03:47.680
I wouldn't be as good at it, and none of us would.
00:03:53.440
Another tool prevalent in the Ruby community is testing.
00:03:57.600
This is an example of someone who should have done more testing.
00:04:04.000
Testing can save you from running into production errors.
00:04:10.080
In the Ruby toolbox, tests are like the hammer—something we turn to frequently.
00:04:15.760
We use tests to prevent regressions, specify behavior, and even drive development.
00:04:25.760
If we write tests, then we expect to have perfect code, right?
00:04:31.759
The fundamental logical flaw with testing is that while tests help verify our code, they themselves are code that can have bugs.
00:04:42.640
So, what's the solution? You should test your tests.
00:04:49.920
One tool people use to measure the effectiveness of their tests is code coverage.
00:05:01.280
Code coverage is a metric designed to tell you whether your tests do what they're supposed to.
00:05:08.160
However, it can give you a false sense of security; 100% code coverage doesn't guarantee bug-free code.
00:05:16.080
Code coverage, while helpful, is not foolproof.
00:05:22.639
Code coverage is built into Ruby, and while I'm not against it, it can give you a false sense of security.
00:05:28.640
A lot of people believe that reaching 100% code coverage means their code is perfect, which is simply not true.
00:05:38.560
So, is there hope for us? How do we test our tests?
00:05:44.319
It's a kind of paradox: who watches the watchers?
00:05:52.759
If we can't trust our tests, why are we writing them at all?
00:06:01.120
I'm going to argue that mutation testing is a solid solution to this problem.
00:06:08.399
The core idea behind mutation testing is to take your tests and run them against your code to see if they pass.
00:06:15.120
If they pass, mutation testing modifies your code at runtime and runs your tests again.
00:06:22.400
The expectation is that if your code is modified, one or more previously passing tests should now fail.
00:06:30.080
The modified version of your code is called a 'mutant,' and if a mutant survives the test, it means there's something wrong with your tests.
00:06:37.200
There may not be an issue with your code, but you definitely have a problem with your tests.
00:06:44.079
This technique helps answer the question of what tests should I write, which many beginners struggle with.
00:06:52.240
It also addresses the question of how do I know when my tests are sufficient.
00:06:59.840
Mutation testing provides a quantitative answer to those questions.
00:07:06.880
You can confidently say a certain code has 100% mutation coverage.
00:07:13.600
For example, here's some code with an assertion about the code.
00:07:19.120
I have a method called 'foo' that takes an optional argument with a default value of true.
00:07:25.839
The body of the method either returns this argument or raises an exception.
00:07:32.720
In my assertion, I am checking that if I call 'foo' without any parameters, nothing is raised.
00:07:41.279
This test will pass because calling 'foo' with the default parameter will return true.
00:07:48.720
However, this does not mean you're done writing tests.
00:07:55.279
A mutant for this code might just remove the 'or fail' statement.
00:08:01.840
This means your test should flag an error if this appears.
00:08:08.720
If it does not, then you are not testing your code sufficiently.
00:08:15.200
This type of mutation is known as a 'statement deletion mutation'.
00:08:22.960
There are several other types of mutations, like changing default parameters or altering logical conditions.
00:08:31.039
The core idea behind mutation testing is that these cases help you ensure your code is fully tested.
00:08:40.639
Mutation testing allows you to know when you've sufficiently tested your code.
00:08:47.839
Katrina Owen tweeted that if you add more granular tests, you'll find more bugs.
00:08:55.760
In many cases, Mutant, which is a mutation testing framework, will help find those bugs.
00:09:01.840
Now, I'll switch gears to some live coding.
00:09:07.760
This is the introduction; now we will write some code.
00:09:14.080
Let me quickly switch to mirror displays.
00:09:19.760
That's a pro tip; thank you!
00:09:22.639
A new version of Mutant was just released moments before this presentation.
00:09:28.320
I am not the author of Mutant, it's a great library by Marcus Sherp.
00:09:34.640
I encourage you to check it out; version 0.5.11 is hot off the presses!
00:09:41.600
So here is some code for our live coding demo.
00:09:48.800
The focus of this demo is on writing tests, not on writing code.
00:09:56.000
Mutant does not verify if your code is correct; it verifies that your tests are correct.
00:10:03.680
This code is quite simple but let's walk through it to ensure everyone understands.
00:10:09.920
There's a module that represents the universe and within that, we have planets.
00:10:16.079
This class we'll define takes a radius and an area as parameters during construction.
00:10:23.440
The radius is the planet's mean radius in kilometers, and the area is the surface area in square kilometers.
00:10:30.639
There is one public method: 'spherical,' which returns true if the planet is a perfect sphere or close enough.
00:10:37.200
To determine this, we calculate the approximate area using the formula 4πr².
00:10:43.120
If the area matches our approximation, it indicates that the planet is spherical.
00:10:50.239
If it isn't true, then the planet is not spherical, it might be oblate like Earth.
00:10:57.440
We have a private method that generates a tolerance range.
00:11:04.399
We don't want it to be too precise because we deal with pi, which is non-terminating.
00:11:10.000
We're checking if the actual area falls within these bounds.
00:11:16.880
If everyone understands this code, it's pretty simple, and I want to do a quick interactive poll.
00:11:23.920
How many tests do you think are needed to fully cover this code, particularly the spherical method?
00:11:30.879
Who thinks you need zero tests? Show of hands.
00:11:37.200
No? Good, you've been paying attention.
00:11:42.159
Who thinks you can do it with one test, maybe the happy path?
00:11:48.240
Raise your hands if you think that's sufficient.
00:11:54.639
Nobody? I'm glad you all agree.
00:11:59.440
You can achieve 100% code coverage with just one test but not 100% mutation coverage.
00:12:05.440
So let's prove this to you regarding 100% code coverage.
00:12:11.840
How many tests do you think would achieve this? Is it sufficient?
00:12:20.160
What about two tests?
00:12:26.720
Yes, someone mentioned testing the happy path and the sad path.
00:12:32.000
How about three tests? What would your third test be?
00:12:38.000
A value that would blow up the computation?
00:12:45.280
Passing in a string, expecting an exception? That's a good one.
00:12:51.760
Let's keep going. Who thinks you need four tests?
00:12:56.480
A couple of people do. What would your additional tests cover?
00:13:02.880
Testing a range on both the low end and high end sounds solid.
00:13:08.160
How about five or more? Anyone raising their hands?
00:13:14.720
According to Mutant, you can actually cover this code with just four tests.
00:13:21.600
That would be the happy path, the sad path, and both sides of the range.
00:13:27.840
Let's show how that would work.
00:13:34.480
Currently, let's make a gem file and I'm starting to write some tests.
00:13:41.600
I'm setting up a simple layout with a 'lib' directory and a 'spec' directory.
00:13:49.000
I'm going to add RSpec to these tests.
00:13:54.560
And now I am also going to add Mutant.
00:14:00.520
I'll run 'bundle install' to install our required gems.
00:14:06.560
It just installed the new version of Mutant that was released.
00:14:11.760
Now let's write some specs for our planet.
00:14:17.760
We'll require RSpec and the planet file, and we'll use relative requires.
00:14:25.359
Now let's start writing our specs; we'll describe our planet in the universe module.
00:14:31.760
We'll create a subject, which will be our planet.
00:14:38.240
The planet is initialized with a radius and an area.
00:14:43.600
Let's start with the happy path because that was the first test most preferred.
00:14:52.080
In this case, we'll define Venus as being the happy path.
00:14:58.560
The radius will be a specific value.
00:15:04.480
And the surface area of Venus is approximately 460 million square kilometers.
00:15:10.639
So we will set that to our assertion that Venus is spherical.
00:15:16.640
Now we expect our subject to be spherical.
00:15:22.880
Let's run the tests.
00:15:29.920
Cool, our tests passed!
00:15:33.920
Now let's open our gem file and add SimpleCov to measure the code coverage.
00:15:41.760
With SimpleCov, we'll check that the coverage of our tests is 100%.
00:15:48.560
Once we run our specs again, we should see a coverage report.
00:15:54.640
SimpleCov assures that every piece of code is executed; in our happy path, it indeed does.
00:16:01.120
The class is loaded, and when tested, we can see every line executed perfectly.
00:16:09.440
So although we have 100% code coverage, we all agree this is insufficient.
00:16:17.280
Let’s move on to writing more tests.
00:16:24.560
Now, let’s test a planet that is not spherical.
00:16:31.680
We are testing Earth, which is an oblate spheroid.
00:16:39.120
Its radius and area values are known.
00:16:46.240
We expect our subject not to be spherical.
00:16:52.520
Let’s run this test.
00:16:58.720
Great, the tests have passed!
00:17:05.120
Now let's see how mutation testing works with Mutant.
00:17:11.440
To do this, we will use the mutant command line.
00:17:18.640
We need to pass in our library directory and specify which library to require.
00:17:25.440
Then we will provide the test framework option for RSpec.
00:17:32.000
But first, let’s ensure we’ve added the necessary gems.
00:17:39.479
Now let's run the mutant command.
00:17:45.440
This is fascinating—though we only wrote two tests, many green dots and 'f's will appear.
00:17:53.840
That is because it’s running through multiple mutations of our code.
00:18:00.800
In fact, 83 mutations were generated, and 82 of them were killed.
00:18:05.840
The one mutant we missed provides helpful output to see what needs our attention.
00:18:12.000
We took out specific code lines, and our tests still passed.
00:18:20.200
If you haven’t put in a test that covers this, it’s a blind spot.
00:18:28.960
So let's kill that last mutant that survived.
00:18:36.080
Does anyone have an idea on how we could achieve that?
00:18:43.840
The suggestion was to pass in a zero tolerance.
00:18:50.960
Then we expected our tests to reflect that Venus should not be spherical.
00:18:55.680
Let’s run the spec again.
00:19:01.760
Great, our tests have passed!
00:19:07.400
Now let's run the mutation commands once more.
00:19:12.760
Perfect, we've killed the last mutant!