Legacy Code

How to Write Better Code Using Mutation Testing

How to Write Better Code Using Mutation Testing

by John Backus

In the talk "How to Write Better Code Using Mutation Testing" delivered by John Backus at RailsConf 2017, the speaker discusses the concept of mutation testing and its benefits for improving code quality and test effectiveness. Mutation testing involves modifying code to assess whether existing tests can detect the changes, thereby providing insights into the robustness of test cases.

Key Points Discussed:
- Introduction to Mutation Testing: Mutation testing serves as a powerful method for evaluating test quality by asking how much code can be changed without the tests failing, contrasting traditional line coverage metrics.
- Manual vs. Automated Mutation Testing: Backus demonstrates manual mutation testing using a sample gluttons class to illustrate how little modifications go undetected by tests, making the case for automated tools like "mutants" and "new tests" that streamline this process.
- Detecting Dead Code: The speaker emphasizes how mutation testing can help in identifying dead or redundant code within a project, enhancing overall code cleanliness.
- Understanding Ruby Features: Through various code samples, Backus shows how mutation testing can help developers discover Ruby features and behavioral quirks within their dependencies, leading to better coding practices and fewer bugs.
- Enhancing Tests for Edge Cases: Mutation testing serves to uncover gaps in existing tests, ensuring that edge cases are adequately covered—this is particularly useful for legacy code where test documentation may be lacking.
- Practical Implementation: The speaker concludes by discussing how developers can integrate mutation testing into their workflow, enhancing their knowledge of Ruby, improving test coverage, and ultimately shipping more reliable code. Backus mentions that achieving 100% mutation coverage is not necessary; even partial coverage can yield significant benefits.

Conclusion: Backus advocates for adopting mutation testing in daily coding practices, stating that it fosters a deeper understanding of Ruby and enhances the quality of code being delivered. By using mutation testing to preview the impact of code changes, developers can minimize the risk of introducing bugs into their systems. As a final note, he encourages attendees to explore job opportunities at Cognito, emphasizing a culture of writing better code through advanced testing techniques.

00:00:11.900 Hi everyone, my name is John, and I’m here to talk to you about mutation testing.
00:00:17.520 I'm the CTO at a small tech company in Palo Alto called Cognito.
00:00:22.609 Sorry, there’s a little flickering here; we’re not sure what’s going on.
00:00:27.980 All right, so before I get into it, I want to give you a quick outline of the talk.
00:00:34.350 I’ll give you an introduction to what mutation testing is.
00:00:39.480 Then, I'll show you how it can help you improve test coverage.
00:00:47.370 I'm also going to show you how it can teach you more about Ruby and the code that you rely on.
00:00:55.010 Moreover, I’ll describe how it works like an x-ray for legacy code.
00:01:00.030 It can be a great tool for detecting dead code and it’s probably the most thorough measure of test coverage.
00:01:07.650 Additionally, I'll explain how it can help simplify your code.
00:01:12.979 I’ll wrap it up by discussing the practicality of mutation testing today and how you can incorporate it at your job.
00:01:20.420 Before we dive into mutation coverage, we need to be on the same page regarding line coverage, or test coverage in general.
00:01:25.950 Usually, when we talk about test coverage, we refer to line coverage.
00:01:32.159 Line coverage roughly means the number of lines of code run by your tests over the total lines of code in the project.
00:01:38.640 There are different variations, like branch coverage, but that’s sort of the gist of it.
00:01:44.909 Mutation testing asks a different question.
00:01:50.130 It asks: How much of your code can I change without failing your test?
00:01:57.270 If you think about it, this makes sense.
00:02:02.969 If I can remove a line of code or meaningfully modify a line of code in your project without breaking your tests, then something is probably wrong.
00:02:08.640 There are parts of the code that are either missing tests or are not well-tested.
00:02:15.980 Before we dive into how to automate mutation testing, I want to give you a good intuition of what mutation testing is by doing it by hand.
00:02:22.820 I've got a sample code here, so take a second to read it over.
00:02:27.920 I have this class called Gluttons, and at the top, I initialize it with the Twitter clients.
00:02:38.720 Then, I do a search on the Twitter API using that client.
00:02:45.080 I get the first two results, grab the author from it, and return that.
00:02:50.600 That space is what the test specifies down here; it’s got a fake client and some fake tweets.
00:02:58.220 On the left here, I have the same code but in Sublime Text, and on the right, I've got a script.
00:03:05.540 This script is going to run whenever I modify the file.
00:03:12.110 The script outputs a diff of the code against the current output, as well as the result of running the test.
00:03:21.100 First, I’m going to try to modify the hashtags, but that does not fail the test.
00:03:26.750 I can also remove the search string entirely, and that doesn’t fail it.
00:03:32.980 Additionally, I can call it with zero arguments, and that also does not fail the test.
00:03:40.070 If I change "first" to "two first one," that does fail, which is good.
00:03:45.860 However, if I change it to "first three," that does not fail the test.
00:03:53.450 So, reviewing those again, I can basically change the input to the search method however I want.
00:03:57.099 I can remove the hashtag, the entire search string, or call it with a different number of arguments, and it doesn’t matter.
00:04:06.469 If I change "first" to "first one," that does fail given the specifics set in our fake tweets and our fake client.
00:04:14.900 However, if I change it to "first three," then that does not fail the test.
00:04:20.030 This is manual mutation testing.
00:04:26.360 You can imagine that doing this on a daily basis at your job would be pretty tedious.
00:04:34.759 This was just one method, but if we’re adding to the code, trying to do this for every part of what we’re adding would be a lot of wasted time.
00:04:41.210 It’s also going to be pretty hard to outsmart yourself.
00:04:47.120 If you did the best job you could at writing this code and the tests for it, it would be hard to come up with ideas you hadn't thought of before.
00:04:52.520 Now, I’m going to show you how to do mutation testing with an automated tool. The main tool for this is called Mutants.
00:05:06.289 It’s been around for years. I learned about it about two years ago in a targeted mutation testing effort.
00:05:12.830 Since then, I have become a large contributor to the project.
00:05:19.150 A friend and I also just started a fork of this project recently called NewTests that is very similar.
00:05:25.490 I’ll probably refer to them interchangeably throughout the presentation, but you can use either one.
00:05:30.889 All right, in this example here, I’m invoking the mutant command-line application and passing the required arguments.
00:05:36.849 I’m using the RSpec integration and telling it to mutate the class we just saw.
00:05:43.909 There’s going to be a lot of noise in this output, so don’t worry; we’ll go over the results in detail later.
00:05:49.840 Each disk here represents a mutation that I’m going to analyze while running my tests.
00:05:57.560 Some of the things we identified in our manual mutation testing run include the ability to remove an entire argument.
00:06:05.419 We can also pass a different type of variable to the search, including 'nil,' which is interesting.
00:06:11.349 Furthermore, we can change 'first' to 'last two,' which is also significant.
00:06:16.889 If this is the method that finds the most recent tweets, that’s a problematic change.
00:06:22.030 If we care about finding the most recent tweets, we probably want to ensure we return the oldest one.
00:06:29.169 We can also remove the 'first' call entirely, which could exhaust our API token and rate limit.
00:06:34.300 Our mutation testing tool shows us how to improve the test.
00:06:39.389 It recommends that we give it three fake tweets instead of two and explicitly specify the search we expect it to perform.
00:06:45.729 When we use automated testing, it’s quick and doesn’t require much effort.
00:06:51.700 It's likely to be more clever than what we could perceive.
00:06:58.540 Mutants has been accruing different mutations for years that target specific use cases and point out relevant changes.
00:07:05.820 Here’s another example you might encounter.
00:07:10.990 Imagine you’re working on an internal API, here’s some sample code.
00:07:16.720 We see the users controller and the show action; we’re validating the ID parameter.
00:07:23.770 We ensure that it's an integer and pass it to the user finder.
00:07:29.169 We either render JSON or run an error, which is what the test below specifies.
00:07:32.980 If we run this through our mutation testing tool, it shows us that we can replace the 'to_i' method with the 'Integer' method.
00:07:39.580 That's interesting, because the 'to_i' method works on any string and on 'nil'.
00:07:46.360 If I don’t have integers or digits in my string, it still gives me zero.
00:07:52.420 Calling 'Integer' on 'nil' raises an error if it can’t extract a number.
00:07:59.260 'fetch', instead of '[]', requires key presence, raising an error if the key is absent.
00:08:06.760 It is showing us how to enforce a stricter implementation, emphasizing the presence of the key.
00:08:15.580 Before, if someone incorrectly used the API, they might have passed in something incorrect or omitted the ID key.
00:08:22.240 We would end up querying the database for an ID of zero, which is not ideal.
00:08:29.310 This is a more precise implementation that forces us to think ahead.
00:08:36.740 Here’s another small example with a-created-after action.
00:08:42.950 We’re passing in a parameter called 'after' and parsing that input for a class method on a user called 'recent'.
00:08:50.869 Running that through a mutation testing tool shows us that we can replace 'part' with 'iso8601'.
00:08:58.459 This method is poorly named; it's a more strict parsing method.
00:09:05.770 It specifies the format: four digits for the year, two for the month, and two for the day.
00:09:12.500 This is quite different from the parsing rules for 'date_parse.'
00:09:19.060 The latter tries to parse the input very flexibly.
00:09:24.739 It might find the name of the month in the input, which isn't always what we want.
00:09:29.820 Let’s talk about regular expressions.
00:09:36.870 I'm particularly excited about this part of the presentation because it’s a feature that no other tool in the Ruby ecosystem provides.
00:09:43.240 Mutants can analyze regular expressions and show you if you're not covering branches within it.
00:09:50.790 Here’s some sample code where we're iterating over a list of usernames.
00:09:57.220 We’re selecting the ones that match a specified regular expression.
00:10:03.340 It will point out areas where we can be more explicit in our testing.
00:10:12.160 For example, if we do not provide test input for multi-line strings, it will let us know.
00:10:19.410 We can adjust this to a more strict format, ensuring we are accounting for the cases we want.
00:10:25.850 We also need to ensure we are testing the various conditions inside our regular expressions.
00:10:32.860 It will help enhance our regex conditions so that our tests are more comprehensive.
00:10:39.620 In Ruby 2.0, we have a new match Predicate method.
00:10:48.210 It's three times faster, only returns true or false, and its behavior avoids messing with global variables.
00:10:58.130 Incorporating these improvements results in more strict input handling and clearer error messages.
00:11:05.350 Now, let’s talk about HTTP clients.
00:11:12.940 In this example, we have a method called 'stars_form' using the popular HTTP Party client.
00:11:19.840 It's designed to retrieve data from a repository on GitHub's API and convert the result into a hash.
00:11:26.960 Then, we extract the key for the star gazers count.
00:11:32.480 When we run the mutation testing tool, it indicates we can remove the 'to_h' call and everything still works.
00:11:39.270 This might seem confusing at first, but the HTTP Party client looks at the response header's content type.
00:11:45.820 If the response is JSON, it behaves like a hash, allowing us to drop that method.
00:11:52.390 The beauty is that NewTests doesn’t need specific support for HTTP Party.
00:11:58.630 It understands how to evaluate different parts of your method and suggests changes.
00:12:05.060 It allows you to become aware of Ruby’s capabilities and avoid redundancy.
00:12:11.400 Now, let’s discuss legacy code.
00:12:19.900 This is the same code example we had before with the created-after endpoint.
00:12:26.110 Imagine instead of implementing this method yourself, you’re tasked with updating it.
00:12:34.360 Let’s say the original author wrote it two years ago and left little documentation.
00:12:39.740 When you run your mutation testing on that code before modifications, you’re going to see this mutation die.
00:12:46.600 This raises interesting questions about the author's intent.
00:12:53.020 Did they intend for users to only utilize the strict format, or should it allow any format?
00:12:59.170 How is it actually being used today? If other services use different formats, we want to guarantee compatibility.
00:13:05.860 Running the mutation testing tool gives us a checklist of potential issues.
00:13:12.460 This acts like a playlist of hotspots to diagnose before modifying.
00:13:19.800 It points out possible regression areas in the code that may not trigger any tests.
00:13:26.220 This is an extremely beneficial aspect of mutation testing.
00:13:32.050 Consider this method: when we invoke it, line coverage tools will report high coverage.
00:13:39.740 However, mutation testing indicates deeper issues.
00:13:46.380 It helps us check off-by-one errors and ensures we have comprehensive tests for boundaries.
00:13:53.300 It encourages rigorous testing, enhancing our methods to avoid missteps.
00:14:00.520 Here’s a nine-line method contemplating user permissions.
00:14:06.470 There are multiple roles possible: guest, muted, normal, admin, etc.
00:14:12.740 The tool challenges us to ensure we test all user conditions thoroughly.
00:14:20.640 It forces a reflection on how many conditions must be tested, which may amount to over 31.
00:14:29.310 This complexity is often hidden, and mutation testing brings it to the surface.
00:14:36.290 Here’s another small example: I’m taking an OSD user collection.
00:14:43.160 I'm grabbing emails while filtering out those without emails or who've unsubscribed.
00:14:50.940 Initially, I have a valid user and one without email.
00:14:57.090 I assert valid emails are the only things present in the output.
00:15:03.180 Then I do the same thing with a valid user and an unsubscribed user.
00:15:10.520 Here, my mutation testing tool is showing I could skip the iterations by manipulating the input.
00:15:18.920 This allows the bad user to end up on top, resulting in failed tests.
00:15:28.060 Correcting these tests means putting the unwanted user at the beginning.
00:15:35.650 The mutation testing approach also assists in detecting dead code.
00:15:42.160 Consider this scenario: perhaps we discover a column named 'name'.
00:15:48.510 Running the mutation testing tool shows we can replace it entirely.
00:15:54.640 It’s showing us that the method is already fully covered by the parent class.
00:16:02.330 The new user trail allows me to consider whether I’m introducing redundant methods.
00:16:09.040 In another instance, we have a method called 'authorized.'
00:16:14.590 It holds an optional user argument, defaulting to the current user.
00:16:21.080 When reviewing the mutations, it points out redundancy when we always pass in a user.
00:16:28.170 If it used to look different, it would apply that past mutation, leading to an error.
00:16:35.130 In this sense, we can inline the current user and simplify the argument.
00:16:41.580 The mutation testing tools consistently point out potential simplifications.
00:16:47.050 This occurs even with code navigation, ensuring I refer to the correct methods.
00:16:54.990 We pass in an ID parameter before calling the post finder, rendering a valid HTTP status.
00:17:01.800 The mutation testing tool reveals the non-necessity of that status entirely.
00:17:08.540 The default response associated with successful execution suffices.
00:17:15.030 Just as useful as it is for detecting dead code, mutation testing can simplify code.
00:17:22.039 For example, in a user IDs parameter, we can supply the users directly without splatting.
00:17:29.059 This exposes us to Ruby's interface intuitively, all at no extra cost.
00:17:36.420 Another example arises in a method designed to welcome users.
00:17:42.200 The mutation testing solution suggests navigating to a clearer syntax with the user method.
00:17:48.380 It demonstrates a streamlined approach that resolves to clearer errors.
00:17:54.590 Now, consider passing a string in a UNIX system by replacing the leading tilde.
00:18:01.460 We can verify that we’re not terribly dependent on overly generalized substitutions.
00:18:08.540 The mutation testing tool points out how using 'sub' can be more straightforward than a global method.
00:18:14.490 Serving as another example: if we’re looking for images unseen for two years.
00:18:20.380 The mutation testing tool displays how we can replace 'map' with 'each.'
00:18:27.480 This reaffirms we aren’t generating new arrays, simplifying our logic.
00:18:34.990 Finally, we encounter an example using the Logger from Ruby’s standard library.
00:18:42.560 Here, we’re setting a formatter that transforms log event details into a formatted string.
00:18:50.960 The mutation testing tool recommends swapping out 'Proc' for 'Lambda.'
00:18:57.820 Usually, I forget the differences, but for our purposes, it highlights valuable distinctions.
00:19:04.810 Procs and lambdas are similar but behave differently with the number of arguments.
00:19:12.220 You can avoid common pitfalls by adhering to using lambdas, providing a safer environment.
00:19:18.670 Mutation testing yields significant insights, but one wonders about its practicality.
00:19:25.570 Some may find its feasibility daunting but, you have more manageable options.
00:19:33.320 You can pass in flags telling the tool what to focus on, like just the parts altered.
00:19:40.080 This can be a tremendous time-saver, honing in on critical areas.
00:19:46.610 Overall, mutation testing signifies robust user growth over the years.
00:19:53.180 If you’re not implementing mutation testing daily, it can benefit you immensely.
00:20:00.340 It encourages learning nuances in Ruby’s special cases by exposing unique mutations relevant to your tasks.
00:20:08.620 You’ll learn real-time from mistakes, sparking more growth.
00:20:15.550 Ultimately, it improves your abilities in writing better tests and enhances your understanding.
00:20:23.350 You'll model your understanding of the code better and deliver products with fewer bugs.
00:20:30.720 The insights provided ultimately guide you to refine your testing strategies.
00:20:37.690 Employ this mutation testing process beforehand, particularly with unfamiliar code.
00:20:44.190 This guarantees you merit list and hotspots that indicate breakpoints.
00:20:51.700 Simplifying your code forms an integral part of this tool.
00:20:58.160 If you're eager about growing as a professional and seeking effective tools, embracing mutation testing will certainly propel you.
00:21:05.490 If your coworkers aren’t thrilled, you can still leverage NewTests before pushing.
00:21:12.240 You’re likely to learn more about Ruby as well as enhance your tests.
00:21:18.980 If you're a team lead, consider implementing it for continuous integration.
00:21:25.620 You don’t need to aim for 100% mutation coverage to derive benefits.
00:21:31.920 Recognizing code changes that won’t break existing tests is invaluable.
00:21:38.370 It prepares authors and reviewers alike.
00:21:45.680 If you liked the insights shared today and appreciate great code, contact us.
00:21:53.220 I hope you’re all excited about using mutation testing!