RubyConf 2022 Mini

Summarized using AI

Weaving and seaming mocks

Vladimir Dementyev • February 28, 2023 • Providence, RI

In his talk "Weaving and Seaming Mocks" at RubyConf 2022, Vladimir Dementyev explores the use of mocking in Ruby testing, emphasizing the importance of keeping mocks aligned with real object behavior to avoid false positives in test results.

Dementyev begins by discussing the distinction between classical and mock testing styles in Ruby. While the classical style uses real objects to mirror production code paths, the mock style replaces real dependencies with fake objects, potentially disrupting the connectivity of execution paths. This can lead to situations where tests appear successful despite real code breaks, a phenomenon referred to as false positives.

Key points covered in the talk include:

- The nature of mocks and their implications in testing.

- Examples of how false positives can occur, such as under-refactored methods or 100% test coverage failing to indicate actual functionality.

- A historical context of mocking in Ruby, including references to RSpec and the introduction of instance doubles to ensure method verification.

- The significance of type checking, illustrating how type signatures can support more reliable mocking practices.

- An innovative concept of using contract-based verification for mocks, where mock definitions include expected argument values and return behaviors to enhance correctness.

- The introduction of a tool called "Mocksum" intended to ensure safer mocking practices by analyzing real object calls and improving type verification approaches.

Throughout the talk, Dementyev uses his open-source project, Anyway Config, to provide practical examples of testing strategies, illustrating both the pitfalls of mocks and potential solutions through improved methodologies.

In conclusion, the talk stresses the need for developers to be cognizant of their use of mocks, emphasizing a balance between mock and integration testing to achieve more reliable software. Key recommendations include embracing typing systems, improving mock context management, and integrating verification patterns to secure testing practices.

Weaving and seaming mocks
Vladimir Dementyev • February 28, 2023 • Providence, RI

To mock or not mock is an important question, but let's leave it apart and admit that we, Rubyists, use mocks in our tests.

Mocking is a powerful technique, but even when used responsibly, it could lead to false positives in our tests (thus, bugs leaking to production): fake objects could diverge from their real counterparts.

In this talk, I'd like to discuss various approaches to keeping mocks in line with the actual implementation and present a brand new idea based on mock fixtures and contracts.

RubyConf 2022 Mini

00:00:11.540 Hey everyone! I think we're ready to start the last session before lunch. You know, I want to make you suffer from hunger for a lot of time, but I don't promise that.
00:00:16.619 Anyway, let's start with the talk. The title is 'Weaving and Seaming Mocks.' You might guess that we're going to talk about tests and testing techniques. Testing is a fundamental part of software development with Ruby; I can't imagine writing a Ruby program without writing tests. Well, I can imagine it, but I don't recommend doing that if you want to build something real out of it.
00:00:31.199 So I'd like to start the discussion with a question. This is not going to be 'Do you write tests?' because I believe I know the answer, or maybe my belief is too strong. But anyway, let's move on to the next one: How do you write tests? By 'how,' I mean which testing style do you prefer?
00:01:03.539 According to the theory, there are classical and mock styles. Let's see what this means in the Ruby context. Given a code example—some method of some known object—we do not care about the particular class that does some searching functionality. How do you write a test for this if you're a classical style follower?
00:01:29.220 Let's say you define the context required for this method to be executed and then assess the return value given that context. If you're a mock style developer, you first identify the dependencies in your method and isolate them. You don't want to test the dependencies in this method test; you only want to test this particular method. So, let's replace the query and user classes and their objects with some fake objects and verify that our method communicates with them correctly.
00:02:04.439 That's just a matter of style. I'm not going to argue which one to use; that's not the topic of this talk. Luckily, we have great resources and talks on this topic, whether to mock or not. One particular talk I would like to recommend is 'To Mock or Not to Mock' by Emily Sam from last year's RubyConf.
00:02:37.200 For today, I want to focus only on one important difference between these two styles. This difference is based on the way code execution paths happen within the codebase. When using a classical approach, every object is real and the code execution paths in your test environment are the same as in your production application.
00:03:06.599 So what is a code execution path? It's like a call trace; we can build a graph of how real objects communicate with each other—they're all connected. When using the mock style, however, we lose this connectivity because every time we hit a fake object, we introduce a gap in our execution path. It's not really an execution path; it's a collection of partial paths, and the communication graph that could be built out of it is disconnected.
00:03:50.700 This picture is not typical for Rails or Ruby applications. Usually, we mix both styles. From my experience, many developers prefer to go classical and introduce mocks in some places. But even using this mixed approach, we can hit this disconnection problem, like the disconnected graph, which can lead to false positives.
00:04:18.479 So, what is a false positive? Given our example, currently our code is working and our test is green. Now, we introduce a breaking change to one of our dependencies' APIs—like we change an argument type. For example, the code is no longer valid. It's broken, but the test is still green, and we failed to catch the problem.
00:04:41.460 That's an example of a false positive—just one of many we're going to talk about today. The topic of this talk is how to avoid false positives when using mocks, even in a mixed test base. In other words, how to put seams in our communication graph to make it comply with reality.
00:05:06.479 A bit of introduction: My name is Vladimir Dementyev. You might have seen me on GitHub as Falcon. I maintain some projects—probably a few dozen—and I work for a company called Evil Martians. We are product development consultants helping businesses to grow. Apart from that, we're actually doing a lot of open source stuff—really a lot. You've probably used some of our projects, at least one of them.
00:05:36.960 One project I want to separately mention from this long list is Test Prof. I started it five years ago, and time has passed since then. That's how I got attached to testing and doing Ruby test-related research. I see a lot of codebases, and I notice many problems that could be typical to many of them. One of these problems is the unsafe usage of mocks.
00:06:05.759 So the talk is about keeping mocks in line with real objects. That could be like an official title of this talk, rather than just 'Weaving and Seaming Mocks.' If it were a scientific conference, the title would be more descriptive to demonstrate the problem and the solutions.
00:06:23.039 Of course, I need some playground to demonstrate this, and I chose one of my open-source projects called Anyway Config. It's a configuration library for applications that allows you to separate the concept of a configuration object from the configuration source.
00:06:56.400 You use Ruby objects to represent your parameters, and the library takes care of populating them from different sources, such as environment variables. Here's how it works: It picks up the matching environment variables and parses them according to some conventions.
00:07:14.759 For demonstration, we're interested in loading data from environment variables into configuration objects. We have two components involved in this process. The first one is the ENV parser, responsible for dealing with the actual environment. It describes how to build hashes from ENV.
00:07:39.300 The second component is the ENV loader, a source loading plugin for the library. Anyway Config allows you to load configuration from different sources, and every source is backed by its own source loading plugin.
00:08:05.759 This loader is very simple; it just knows how to use the parser properly. So we have a dependency here, and we have tests. I picked up a subset just for these two components. It has 100% coverage, and it's green, great! We're going to use it.
00:08:29.940 My tests are more on the classical style, to be honest, so my tests looked like this initially. In order to test the loader class, we actually set up the whole environment, using real ENV objects. But that means we are actually testing the parser here more than the loader, because the logic of dealing with the actual ENV is the responsibility of another component.
00:09:04.800 Let's rewrite it in a bit of mock style and test what the loader should do by introducing some fake objects. Let's fake the parser and just make sure that we use the correct API provided by this object to get the data.
00:09:38.220 Here, we introduced the possibility of false positives. The first case for false positives I'd like to talk about is undefined methods. Let's imagine the following refactoring: we decide to merge two API methods into one because fewer API methods is better, so we do not expose a lot.
00:10:02.640 But to keep the previous behavior, we have to add a positional argument. However, we don't have the patch with the trace method now. Our test is still the same; we haven't touched it because why should we? It's still green, right? We refactored our code and the corresponding unit test for our parser, but our test remains green and the coverage is still 100%. The problem is our code is broken.
00:10:28.920 It's no longer valid. This is a double trouble. Why is it double trouble? Because both tests are green and coverage is 100%, creating a false sense of security that everything is okay, but it’s not. Speaking of coverage, let's take a look at it.
00:10:55.360 We have some historical data for our coverage, and it's 100% for all the steps. If we look at a particular file that is broken, we can see that the line is covered but the covered line doesn't mean anything. That’s just another example: 100% coverage is not a silver bullet; it’s not something you should worry about.
00:11:09.360 So, what can we do about it? How to make sure fake objects take care of such false positives? That's not a new question. Actually, I want to go back in history about 10 years and tell you about an issue that was opened in the RSpec mocks project. I think this is the first time the question was really raised, which led to some features added to RSpec.
00:11:39.779 How do we deal with stubbing a method that is not defined, that is missing, that was removed, or whatever? The project already exists which solved this problem; it’s called RSpec fire. Of course, this is no longer maintained; it's archived. This project was merged into RSpec in the form of instance doubles.
00:12:01.020 The feature changed a lot of things but, in this particular case, it fixed our test suite. By 'fixed,' I mean it let it fail when the corresponding dependency changed its API. That's a simple example of how to make mocks more stable. We already have it; RSpec and MiniTest check for method presence by default when you're using stubs.
00:12:29.880 So we have different types of doubles. We have just a simple double with no commitments, and we also have verified doubles. That's one of the terms from RSpec, like instance doubles, that check for method existence. But that's not the only benefit; it also helps with some other false positives. For example, when the method signature is not valid.
00:12:50.880 When we call methods that exist but we pass arguments that are not acceptable by this method. Continuing on our dangerous refactoring, we realized that adding a positional boolean argument is not a good idea; it's never a good idea. Let's convert it into a keyword argument instead. That's much better from an API design perspective.
00:13:24.120 Yet, our test suite still tests the method which accepts the positional argument. The good thing about instance or verified doubles in RSpec is that we can catch this problem. It not only checks that the method exists on a particular object but also verifies that the passed arguments match the signature.
00:13:53.880 How would we do that? A bit of internals: RSpec uses a message signature verifier that performs a lot of checks and validates whether the call is valid. Under the hood, it uses method parameters—an example of powerful Ruby introspection capabilities.
00:14:06.640 For every given method, we can ask the Ruby VM what was declared in the method definition as parameters. We can see the type, the name, and whether they are required arguments, required keyword arguments, and blocks. Using this, we can check whether the passed arguments match the method definition. This works if you have explicit arguments specified.
00:14:32.880 But if you have splats or keyword arguments forwarded from Ruby 3, it won’t be as useful. Nevertheless, that’s much better than nothing. So we have method parameter verification with instance and verified doubles in RSpec. However, I said that parameters are not actually the full signature.
00:15:07.680 The signature is something more. The parameter shape is one aspect of it; it also includes argument types and return value types. Types are more important, as we need to ensure we pass the correct objects to our mocked methods. Unfortunately, we cannot guarantee this with the existing tools.
00:15:46.740 Let's demonstrate that with another refactoring. Previously, we returned an array of two values as a result of our method, which is not a good idea. Let's wrap it into a data value object, like a struct, having two fields: data and trace.
00:16:10.260 As before, our tests stay the same and once again our tests pass while the code doesn’t work. The error is cryptic, indicating that we passed something incorrect to our method chain, and we see an 'undefined for nil class' error. Good luck figuring that out.
00:16:34.080 This is a problem that is hard to solve with tools that were developed about ten years ago. I want to refer to RSpec fire again. It's an amazing project with some great ideas.
00:17:01.460 In 2022, we lost something in translation from typed languages. Today is not 2013—today is 2022, and we have types. We can add them not just to mocks but also use them to verify our mocks. Imagine mixing RBS or Sorbet into verified doubles, calculating the method signature not just with parameters but also considering argument types.
00:17:37.660 What is a type double? It’s based on two ideas. We can intercept calls made on mocked or fake objects, and we can type-check them at runtime. Both RBS and Sorbet provide runtime checking capabilities, which is what we can use here.
00:17:55.500 A simplified version would be replacing the dispatch.
00:18:05.240 Fortunately, RSpec has a single entry point for intercepting proxy calls, so we can ignore them and type-check with RBS. I chose Anyway Config not accidentally; one of the reasons was because it already includes types in the gem.
00:18:24.080 So I did this patch, and now my tests are failing, which is great. That's what I was looking for! Now my instance doubles don't even require a new DSL or API; we can use RSpec instance doubles but extend their verification with one more step to confirm the type signature.
00:18:46.540 That could be the end of the talk, but the most interesting part begins here. Raise your hand if you are using types in your applications. Okay, one person.
00:19:02.920 I don’t want to stop here; I want to bring this power to those who haven’t embraced types yet. And that's what we are going to do: we are going to generate our type signatures on the fly and yet still use our type checkers to verify them.
00:19:23.080 We already have the part responsible for type checking—we built it! Now, we need a way to generate signatures. Let’s think about how we can do that. There are some static tools like RBS prototype or Tapioca for Sorbet, and we can use Type Profiler, but they are all separate steps that require to really generate types.
00:19:49.380 We don't want that; we want to use only our test suite as a source of truth. The good thing about tests is that they allow you to analyze your codebase much better than anything else, especially since you have lots of method calls executed and all that runtime statistics.
00:20:23.280 We can use it to generate types. The process consists of several steps: we need to collect method calls made on real objects. I assume that you will cover your mocked objects separately and perform real method calls.
00:20:47.340 We can intercept these calls and collect the arguments and return values. There are multiple options, and I suggest going with TracePoint for a few reasons: the main reason is that TracePoint doesn’t interfere with the object under tracking.
00:21:03.960 You do not need to patch it to include anything; you just rely on the VM-level events which are not affected. It's a bit tricky to get arguments from TracePoint, but luckily we already know about method parameters, and TracePoint provides the binding to extract the past arguments from the trace point.
00:21:24.560 Next, we generate types from the call traces. What is the call tracer in this case? It’s just a combination of past arguments separated by position, keywords, and return values. For every tuple, we can generate a mapping of the classes used to pass here, collect the unique classes, and create a union type in our signature.
00:22:06.840 Here's the generated type signature for our 'Any' class, generated by executing the test suite and for this class. Running unit tests, it’s fairly accurate—100% correct, actually. It matches the signature we wrote by hand.
00:22:34.560 We know how to collect call traces and generate dummy signatures, but we cannot collect all the possible call traces in our application. You can imagine that running a Ruby test suite would involve millions of calls, which would just slow your tests down.
00:23:01.080 So we need to selectively look for only the classes or modules, or let’s call them objects and methods, which we used in our mocked objects. For that, we need to analyze which objects are mocked in the test; this is a must.
00:23:31.440 Here comes the tricky part. To solve this problem, I went with fixtures. How do mocks relate to fixtures? There’s an interesting article and idea originally introduced by one of my colleagues. It’s available on our website.
00:23:58.020 The idea there is that we can unify using different kinds of fixtures, not just for data, but also for mocks. A tool was built called Fixtureama, which allows you to define mocks, creating a repository of mocks in YAML files like fixtures. You can just load this YAML in your test and use it.
00:24:29.520 The good thing about this is we have a single place where all the mocks are defined, and we can rely on that location. I was thinking about how to explain fixtures and whatever I wanted to do, but I found that YAML is not Ruby, and I don't want to enforce users of my library to rewrite their mocks into YAML.
00:24:54.520 That’s not the best way. I want to be as compatible with the current test style as possible. So I came up with an idea of a mock context. What is a mock context? The idea is as follows: you extract a shared context, like in RSpec. If you are unfamiliar with shared contexts, they are in RSpec to share behavior extracted from tests.
00:25:18.000 The mock context is just a shared context with one additional feature: it evaluates this context as soon as it's included in some tests, allowing us to infer that these mocks will be in use. This gives us an indication regarding which objects are being mocked.
00:25:54.180 The code is simple: we rely on RSpec keeping track of all mocked objects in a single place in the proxies' registry. We use it to figure out which objects and methods are being mocked. After that, we set up call tracing for just these objects and methods.
00:26:20.520 Now we can evaluate the mocks against their types. The only change we need to implement in our tests is to move from inlining our mocks and using context for that.
00:26:44.460 Finally, since we collect call traces and mocked calls concurrently, we can't just verify a mock at the time it was called. We may not have call traces for the corresponding real object. This is why we move verification to the past run phase.
00:27:09.060 We do that after switching some things up, inferring types, checking types, and failing if not all type checks pass. The overall diagram looks like this.
00:27:30.340 Before I switch to the last part, let me take a drink of water. Let me give you some time to figure out what just happened.
00:27:42.300 Let’s move on: type signatures are great, and we can verify that mocked calls satisfy the types for the real objects, but functional types do not save us from false positives.
00:28:03.579 Sometimes, values matter. I call this 'non-matching behavior,' which means that for particular explicit values, the returned type or return behavior can vary—like returning an exception instead of an array. For example: what if we add a returning nil if the prefix is empty? If we pass an empty string, then return nil.
00:28:30.420 The mock assumes the object is a chart each time, independently, whether we passed an empty string or a non-empty string; there is no difference for the mocked object.
00:28:57.900 From the type perspective, we cannot add this difference for now with RBS. I'm not sure about Sorbet in this context, but we cannot specify that when the value of the string is an empty string, this particular string returns nil. Therefore, we have to use an optional return type.
00:29:22.020 Type signatures don't help with this difference, and the only way to ensure correspondence between mocked objects and real objects is to add some contract. How can we do that? We can consider mocks to actually be a contract.
00:29:51.480 When a mock defines which argument values are accepted, the corresponding return value generates. It kind of states the contract: "I rely on the fact that when I give an empty string, I should get a non-nil value back." Then we need to use real objects to validate that contract.
00:30:10.140 This contract-based mock verification is not new. We can go back a few years and find some projects that embraced this approach by defining verified mocks using contracts.
00:30:35.460 One thing they had in common that I didn't like was the requirement to explicitly define the contract at the place where you use the custom DSL to define the mock and the contract and keep them together. This is a lot of work.
00:30:58.480 I want to use my instance doubles to ensure it's auto-verified. If that's possible, and it is for some cases at least. We already know how to collect the meta-information about mocks, and this information contains the expected argument values and expected return values.
00:31:22.740 We can construct a verification pattern for the mock, and then we can use our real call traces to find the matching call trace. If there is a matching call trace for this mock, then the mock is considered verified. If not, then we missed this verification.
00:31:50.160 We do not have a matching unit test demonstrating that when passing an empty string, we expect a non-nil value. This resembles a verification pattern that is somewhat similar to a signature, but it uses explicit values and placeholders for values we cannot match.
00:32:10.800 In the end, if we do this match in our test suite, we can even catch the problem with empty strings. It looks like we registered verification patterns with them to ensure, for instance, that nil is returned, but when we passed an empty string, we then observed that nil was also returned.
00:32:39.300 This indicates that our contract was broken, leading to failure. Eventually, that summarizes our complete diagram of what we have so far: we not only type-check but also add some verification to match both the actual values and their return types.
00:32:58.320 Okay, I think the most complicated part has finished. A few things to consider for future directions: the current implementation has some limitations. First of all, TracePoint does not test even without doing anything within. Just checking whether the class is within the hash could drastically slow down the suites.
00:33:21.000 There are alternative strategies, like module prepending—which has a version for it. However, the problem with that is it breaks the method signature and shape. When you do module prepending, you usually delegate to super, which might change the format of arguments. This makes the actual method parameters no longer useful for verification.
00:33:50.520 Another approach I considered is rewriting the source code. We can indeed do that; we already have a transpiler to make that happen. This could inject some code to keep track of the calls, which would make it safe and fast but requires some extra work.
00:34:11.520 The biggest problem with all contract-based mocks happens during parallel builds, and I have some ideas on how to solve that. The first idea is similar to how we handle coverage: we can generate artifacts for every build that have separate jobs to analyze collected mocks and real calls.
00:34:30.720 That's possible because we already deal with types, not real values. We can easily dump into JSON or similar formats. Another option is to keep auto-generated verification patterns and types directly in the repository. This way, we can persist them and use them for later calls as long as they haven't changed.
00:35:08.640 That’s solvable, but not yet fully implemented. Lastly, verification patterns currently work great for primitive values like value objects but for custom complex classes, we will still have to write some code.
00:35:23.640 The question is whether we should bother with all this machinery, or perhaps it’s okay to have a low chance of false positives and not care. That’s up to you; I think I have had some experiences where such false positives led to downtime.
00:35:56.520 It’s a matter of your SLA (Service Level Agreement) and whether you can afford it. I think additional care never hurts, but in general, you can avoid mocking by writing integration tests—yeah, slower tests, but more reliable.
00:36:32.520 Why not? I like this quote from the Active Interactor gem: "Coverage for unit tests is okay, but real coverage should be high for integration tests, otherwise, you cannot sleep at night." So, there are things to pay attention to.
00:37:02.640 The first option is just to keep it real, and that’s it—but that’s not always possible. Slow tests can be a problem, I know that. Using mocks is one of the options you could consider.
00:37:22.680 Another thought is to know your devils. Know how often you use mock objects; how many different objects do you mock across the codebase? You might be surprised by the numbers, and to help you know this, I built a query pattern for syntax trees.
00:37:53.880 This can help you find all usages, which is only for RSpec mocks at the moment across the codebase. You can learn more about syntax trees in the next talk. You can also find the pattern in the gist.
00:38:09.720 One suggestion I highly recommend is to fixurize doubles, like extracting them into shared contexts. Keeping a library of mocks is really useful to understand how your tests use fake objects.
00:38:26.700 Finally, embrace types. I didn't expect to say that, actually. At least you can create a linter tool to ensure that every object you stub in your tests has a type signature. You don't need to cover all your tests with stubs.
00:38:42.960 You can rely on runtime testing for real object tests and type double for mocks; that’s a good compromise. So, where can you find everything I was talking about?
00:39:04.680 I was trying to come up with a name for the gem. No, no, not yet. Don’t take a picture yet.
00:39:28.920 I was playing with an analogy, and the name 'mocks you' felt too boring. I had a sudden realization: 'Mocksum' could work well. That’s where you will eventually find it. You won’t be able to find it yet because I haven't pushed any code or made it public.
00:39:51.120 I will do that soon. So feel free to try it out, give your feedback, and make your mocks safer. Thank you!
Explore all talks recorded at RubyConf 2022 Mini
+33