Summarized using AI

Mutant on Steroids

Markus Schirp • March 22, 2019 • Wrocław, Poland • Talk

The video titled 'Mutant on Steroids' features Markus Schirp presenting at the wroc_love.rb 2019 event, focusing on the concept of mutation testing—an established technique that evaluates the effectiveness of automated tests by introducing controlled changes (mutations) into the code. Schirp explains that mutation testing provides a strong form of code coverage by revealing unspecified semantics in code through derived checks.

Key points discussed include:

  • Definition of Mutation Testing: An overview of mutation testing, which has roots dating back to the 1970s, and its role in identifying untested areas of code through automated transformations.
  • Entities within Mutation Testing: Introduction of terms such as 'subject,' which refers to any testable code, and 'match expressions' that guide the tool on what subjects to focus on.
  • Selection Process: The importance of accurately selecting relevant tests to run against the mutated subjects to optimize the runtime and effectiveness of the testing process. Schirp emphasizes the significance of clear metadata in test frameworks like RSpec and Minitest.
  • Understanding Mutations: Schirp showcases how specific operators are applied to subject methods, producing mutations that can lead to either alive (valid) or dead (invalid) statuses, which help diagnose code quality.
  • Workflow and Prerequisites: The talk also discusses prerequisites for mutation testing, including the need for initial passing tests and the idempotence of tests to ensure reliability during iterations and regression checks.
  • Incremental Mutation Testing: Schirp recommends applying mutation testing incrementally, focusing on modified subjects within evolving codebases to maintain momentum without overwhelming developers.
  • Benefits vs. Costs: A discussion on the cost-effectiveness of mutation testing compared to manual code reviews, highlighting its utility in revealing overlooked issues and speeding up the development process.
  • Practical Implementation in CI: The speaker suggests ways to integrate mutation testing into continuous integration pipelines efficiently.

Mutant on Steroids
Markus Schirp • March 22, 2019 • Wrocław, Poland • Talk

wroclove.rb 2019

00:00:14.570 I'm happy to be here again. It's a unique situation for me because we had a workshop yesterday, and I recognize many faces from that event. It's great that you are sitting in the same positions as yesterday. So, as a quick warm-up for the crowd who was at the workshop yesterday, please raise your hand. Having that much crowd control makes me happy. Thank you.
00:00:20.480 Okay, so obviously, this talk is again about mutation testing. I found a cool title, "Mutant on Steroids," but the asteroid part is just a joke because I needed a new title after four years of giving these kinds of talks. I'm here to talk about mutation testing itself, which is a super old technique. It’s a strong form of code coverage; in fact, the oldest references I could find date back to the 1970s.
00:00:32.210 I forgot the names of all the original discoverers of these techniques, but they had great ideas. So, what exactly is mutation testing in a nutshell? You've all been to Martin's talk before; he discussed checks that can be automated. Mutation testing is what I call a derived check. It takes the artifacts currently in your system, which is basically your code and automated tests, and feeds them into the derived check.
00:00:44.840 This derived check is itself automatable. Instead of just pointing out clear violations, it constructs semantics, or representations of semantics, that are not covered. Often, there is nothing you can do about the unspecified semantics it reveals.
00:00:57.830 This tool is quite basic; it applies a set of transformations. It works through some kind of black magic and shows you where you're missing unspecified semantics. Unspecified semantics may look like this: this is an example of a unified diff. I made this up; I could have generated a full report, but I wanted it to be as compact as possible.
00:01:08.990 So this is a typical Ruby message. We do two things, and here’s how a real report might look. Anyone who's worked with Git should be able to identify that we are removing a method call, which represents unspecified semantics. The removal in this case represents unspecified semantics that can’t be ignored.
00:01:15.379 At this point, humans must decide what to do with these unspecified semantics. In a mature codebase, it often means those unspecified semantics should be removed. But in a non-mature codebase, which we dealt with in yesterday's workshop, you typically must add an additional automated check to prove to the tool that yes, we actually need this kind of semantics.
00:01:30.380 You have to prove to the tool by adding a test, saying, 'Hey, I really want the semantics. I don't want it to be gone, and I want to make it clear for my future self, my future coworker, or the future intern that this was important.' This could be a call to a calculation method or a call to initialize some logic or whatever. But this tool spills out unspecified semantics, and it does so quite effectively.
00:01:44.780 Because naming is fun and I’m the author of the tool, I'm in a fortunate position to create many names. We need to go through these names to explain several other concepts. If you look at the slides later, everything that's blue links to the mutant documentation. You should be able to click on it and get more information.
00:01:54.570 In this presentation, I’ll cover several important concepts. First, the 'subject' is anything that has tests and can be mutated. Currently, Ruby's mutant only supports instance methods and class methods, but there are other possibilities for future expansion.
00:02:03.560 I could expand it into class-level DSL, constants, and inheritance declarations for classes. But for now, a subject is simply the instance method and class method, and I hope everyone has a good grasp of the basics. Then we have 'match expressions.' This is a mutant-specific concept that helps the engine know where to look for your subjects.
00:02:14.180 Imagine you want to run the mutation testing engine against your project. You have 100 dependencies and 100,000 lines of code. Unless you specify to the tool which subjects are of interest, you're out of luck.
00:02:19.489 So, I created the concept of match expressions. The first match expression is a recursive enumeration; you give it your parent namespace, and it recursively identifies all subjects it can. The second match expression scopes the engine to specific instance methods or singletons, guiding its search effectively.
00:02:39.979 I also introduced the concept of 'selection,' which is the process of identifying corresponding tests for your subject. This selection process is important because it dictates how quickly your mutation testing will work.
00:02:45.689 If you were to run the discovery of unspecified semantics against all your tests—and everyone knows how slow tests can be—you would find it very time-consuming. Therefore, we need an effective method of selecting automated tests to run.
00:03:02.930 These selected tests form a subset of relevant tests. They use metadata from RSpec, which everyone knows involves describe, context, and other useful nesting primitives. However, they typically do not produce anything outside mutation testing. Mutant utilizes this kind of metadata to form implicit selection criteria.
00:03:16.649 When you start to use a tool like mutant, it becomes crucial to be honest in your describe statements. If you're not clear and accurate in your descriptions, you risk selecting too many or too few tests. Too many tests can lead to terrible runtime, while too few can result in insufficient coverage.
00:03:32.690 Minitest is a little different; it lacks implicit metadata because there are typically two forms of using Minitest: the describe syntax that resembles RSpec, but as of now, it hasn't implemented any metadata extraction. You must declare coverage explicitly, stating which test class covers which expression.
00:03:41.099 The Imitation Operator takes a concrete subject and changes it into a different form that, when applied to your tests, fails to be detected. This application of a different form forms your report.
00:03:54.840 For instance, here is a report showing the result of applying a specific operator against the subject 'foo,' which is an instance method. For example, we may have multiple method calls reduced, resulting in a mutation.
00:04:07.480 There are other classes of operators as well, such as auto-color replacement, where yesterday in the workshop, we witnessed mutant converting a 'less than' operator to 'less than or equal' or inverting an 'and' operation. You cannot easily argue which operator has fewer semantics, and this does cause varied mutations.
00:04:18.300 A mutation is an application of an operator against the subject, and the result of the tests run for this subject are your mutations. If the test is green, the imitation is alive.
00:04:28.490 Alive mutations are the ones you deal with most of the time; they indicate correctness in your tests. Dead mutations are problematic because they indicate there are flaws in your code base. I emphasize this concept repeatedly because it is crucial.
00:04:44.400 If an automated process can identify unspecified semantics in your codebase, you have two options: you either remove those semantics or specify them. This circles back to Martin’s talk on automated derived checks.
00:04:58.350 Each of these alive mutations should be treated as flags that have been automatically identified in your codebase by a human. The human must then ask why they are making this change, as a mutation indicates that there was some flexibility or uncertainty in your code.
00:05:11.030 This has to be taken seriously, because an alive mutation represents something that may eventually break your code, impacting your future self.
00:05:23.350 Now, to illustrate this concept, I will show you the report. It is very likely that your code, as it stands now, might appear correct, but the reality is that it may not stay correct over time without specifying your required semantics clearly.
00:05:37.930 Requirements change, code changes, and commits land. Unless you can clearly specify what your required semantics are in a way that prevents unnoticed changes, you will face regressions.
00:05:53.160 This is especially true in Ruby, a dynamic language with limited options for enforcing correctness, making strict semantic test coverage invaluable. Mutation testing exists because we have suffered from regressions and recognized that having a tool to run automated checks on changes could prevent unnoticed errors.
00:06:06.940 If we had a tool that could run all possible changes against our codebase and check if our tests cover those changes, it could help avoid introducing bad changes without our noticing.
00:06:21.850 Now, all the preconditions necessary for running a mutation testing tool or test suite require the following artifacts: we need green tests. Not all tests need to pass when you're working on a larger project, but the tests targeted for mutation testing must all pass.
00:06:35.960 Otherwise, the mutation testing engine will signal that something is amiss. If the tests are red initially, the mutation testing engine will not be effective.
00:06:49.120 Another necessary aspect is that tests should be idempotent. Mutation testing operates on the premise that processes can run repeatedly without adverse effects.
00:07:01.210 Your tests should run without side effects or non-repeatable interactions with resources to maintain accuracy during testing. Mutation testing typically executes the same tests multiple times.
00:07:19.040 You want tests that are highly reliable, randomized, and capable of minimizing the number of test executions required. Your tests' dependency on specific execution order can lead to failure if they rely on artifacts created in previous tests.
00:07:34.450 You want to avoid concurrency issues that arise from shared resources, so concurrent tests need to be managed correctly.
00:07:48.600 If you want to run mutation testing, it's essential to understand how to handle these tests, as using an overarching testing framework influences how you approach mutation testing.
00:08:02.680 So, if you're starting now with mutation testing, often the question arises: if I have a codebase with thousands of subjects and have so many deep problems to deal with, how do I start? Incremental mutation testing is key.
00:08:18.490 Your focus should be on the subjects you're currently working on. During your iteration, mutant will prioritize examining only those subjects that have been modified.
00:08:32.300 In our workshop yesterday, we established foundational knowledge on how this process works; we aren't writing new code or creating future branches. If you're interested in the slides, there are links to the documentation on mutant.
00:08:47.100 Incremental mutation testing helps you begin your journey today or next week. Here’s your thank you slide.
00:09:00.940 You'll notice that the order of elements does not matter; hence, we sort them alphabetically to reduce noise. I am now concluding this talk, which is primarily about establishing a nomenclature.
00:09:13.350 Then we can proceed to the workshop we had yesterday. I'm looking forward to a good Q&A session.
00:09:25.780 I hope I did not answer all your questions yesterday. If there are no questions, I will turn the floor over to the audience. Are there any questions? I don't see any; I need to contact you. Thank you.
00:09:39.600 From a client's point of view, do you find mutation testing to be overhead? In your experience, would you recommend mutation testing for every kind of project, in terms of priority?
00:09:54.240 Yes, I believe we should start from the core domain or business-critical features and slowly expand.
00:10:07.230 That's typically the approach I recommend, starting with the most important features and then gradually scaling.
00:10:17.180 What is more costly: having your coworker mentally disambiguate specific branches of your code and the semantic effect of a method that may be removed, or running a tool for ten seconds that addresses 90% of questions?
00:10:31.050 That's a crucial point because my clients typically see human time as the most expensive resource in a project.
00:10:41.870 Using mutation testing can effectively reduce the time spent addressing trivial questions, allowing for faster and more quality reviews.
00:10:55.180 Clients care about progress per time unit, and it's my responsibility to enhance this metric.
00:11:06.300 Using this technique helps speed up the process by answering many trivial questions through automation.
00:11:18.220 In terms of core domain features, you can never be too careful because any line of code can cause a serious problem.
00:11:29.500 I recommend performing mutation testing everywhere because it's so inexpensive compared to human time.
00:11:42.140 The question was how to integrate it with CI. I've used the generic match expression. Given our code is namespaced, we can effectively tell it where to find all subjects.
00:11:53.950 For instance, this very first match expression applies to thousands of subjects, but the incremental mode makes it fast enough to run during normal CI cycles.
00:12:06.040 You can create a small wrapper that finds all of them, simplifying subject management.
00:12:17.580 Good questions! Yes, I'd like to address your inquiry about mutation outputs.
00:12:32.100 Mutation testing outputs many automated verifications for the codebase. Each output represents one of these automated validations.
00:12:44.880 Please note that mutant is capable of various operations, flipping integers from positive to negative and applying a wide range of mutation operations.
00:12:56.140 So, the question is definitely valid: is it possible to run this tool and discover overloads or other semantic redundancies in your codebase?
00:13:08.300 Yes, the tool is capable, and you can analyze which parts of your code could potentially house issues.
00:13:23.930 I haven't had a specific list of bad code items, but because I know I will encounter them over time, I don’t perceive the need to track this list.
00:13:37.200 However, conceptually, it's possible and it acts as an excellent complexity measurement in relation to cyclomatic complexity.
00:13:50.750 You could run a mutation generation engine and examine which subject produces the most mutations to determine complexity.
00:14:05.810 Do you have experience with mutation testing in other ecosystems?
00:14:20.700 Yes, my experience began with the DataMapper team where I worked on a subproject that involved a relation algebra engine.
00:14:32.910 I've also worked with various DSLs in different languages, as the concepts behind mutation testing are universally applicable.
00:14:48.660 Mutation testing is gaining traction in community-wide development across multiple ecosystems, not just Ruby.
00:15:01.540 What about testing existing libraries? Do you prefer testing private libraries or open-source ones?
00:15:12.300 I have maintained a strict policy to avoid open-source libraries unless it's related to bug fixes due to time constraints.
00:15:27.280 Incorporating mutation testing has influenced the quality of libraries I work with, even if I do not specifically mention mutant.
00:15:42.430 I test code rigorously, sending over mutation tested versions of code rather than going into an open-source spiral.
00:15:55.490 Sometimes, you may encounter mutations that seem equivalent. In those cases, I delve into why that mutation exists.
00:16:06.730 I evaluate whether the mutation provides redundancy; if it does, I change the code back to reflect this.
00:16:21.300 I acknowledge that equivalent mutations are often a concern among researchers, but in practice, they happen infrequently.
00:16:39.250 However, a frequent scenario arises when code delegates actions to a library with a semantic flaw.
00:16:48.460 In such cases, I report the issue back to the library maintainers to eliminate the dead mutation.
00:17:00.860 If a mutation is not killable due to a legacy constraint, I recognize I may be overlooking simpler solutions.
00:17:12.630 The most important point to note is that even with a green codebase, mutation testing does not guarantee correctness.
00:17:24.830 It only confirms that the automated tool could not identify any flaws. I never submit code that hasn't undergone mutation testing.
00:17:39.890 Using mutation testing becomes the first line of defense in code reviews, reducing the burden on human reviewers.
00:17:53.110 It's similar to asking a developer to do manual type checks in a statically typed environment.
00:18:05.050 You mentioned earlier if mutation testing could also highlight factors needing refactoring.
00:18:13.920 Yes, mutation testing can guide you in identifying potential refactors, particularly if your codebase resembles a complex decision tree.
00:18:27.680 For example, when dealing with extensive switch-case scenarios, you can run mutation tests for insightful transformations.
00:18:41.920 It’s beneficial as you can specify your public interface and then iteratively refactor, maintaining coverage throughout.
00:18:56.630 Using mutation testing to find areas for improvement in codebases is a productive strategy. Thank you for your questions. Have we successfully addressed everything?
00:19:09.850 If there are no further questions, I appreciate your engagement. Thank you!
Explore all talks recorded at wroclove.rb 2019
+13