Automation
The Secret Ingredient: How To Understand and Resolve Just About Any Flaky Test
Summarized using AI

The Secret Ingredient: How To Understand and Resolve Just About Any Flaky Test

by Alan Ridlehoover

In the talk titled The Secret Ingredient: How To Understand and Resolve Just About Any Flaky Test, presented by Alan Ridlehoover at RubyConf 2023, the speaker addresses the challenges and frustrations associated with flaky tests in software development. Ridlehoover defines a flaky test as one that produces inconsistent results despite the test code and application code remaining unchanged. He emphasizes that the common culprit behind flaky tests is invalid assumptions about the test environment.

Key points discussed include:
- Types of Flaky Tests: Ridlehoover identifies three primary causes of flaky tests: non-determinism, order dependence, and race conditions.
- Non-Determinism: Tests that rely on non-deterministic features, such as random numbers or the system clock, may yield inconsistent results. Ridlehoover advocates for mocking these elements to create deterministic tests.
- Order Dependence: This occurs when tests pass in isolation but fail due to shared mutable state when run together in a sequence. To address this, developers should isolate the shared state or reset it between tests. Ridlehoover illustrates this with an example involving a cached value across different specs.
- Race Conditions: These happen when multiple processes interact with a shared resource, leading to unpredictable results. To handle these, Ridlehoover suggests strategies like using StringIO for file operations or implementing thread-safe code.
- Debugging Techniques: The speaker stresses the importance of tools such as RSpec's 'order random' and 'bisect' features to help identify flaky tests and their causes.
- Best Practices: Finally, Ridlehoover shares his perspective on maintaining clear and communicative specs. He encourages avoiding overly DRY (Don't Repeat Yourself) practices in tests to facilitate easier debugging.

In conclusion, Ridlehoover asserts that understanding and addressing assumptions about the test environment can significantly mitigate flakiness in tests, allowing for more reliable software development. He encourages attendees to remember that documentation through tests is crucial, and practicality should guide testing approaches.

00:00:18.480 All right, hi everyone! I'm Maple Ang, and I'm part of the program committee this year.
00:00:24.439 Um, who here loves flaky tests? Yes? I'm so glad. Thank you! You're in the right place.
00:00:32.360 I'm here to introduce Alan Ridlehoover.
00:00:37.440 I'm going to read a bio. I should have had this open. Alan is a passionate software engineer who loves Ruby.
00:00:45.120 He is an empathetic leader at Cisco Meraki, a relatable human, and a man of several hats.
00:00:50.680 Welcome, Alan!
00:01:16.880 It's after 4:00 PM, and that release was due an hour ago.
00:01:22.880 You've got less than an hour to leave, or you'll be late to that thing. You can feel the clock ticking.
00:01:31.720 Slack clicks to life. You check your messages.
00:01:37.280 The build failed again. You look at the build, you look at the clock, and you realize you do not have time for flakiness.
00:01:42.759 So, you rerun the build again: two builds, five failing specs.
00:01:51.560 None of them have anything to do with your commit. All you can think about is how you cannot be late to another thing.
00:01:56.840 If only you knew the secret ingredient that all flaky tests have in common, you might be on your way to that thing right now.
00:02:01.920 Hello! My name is Alan. I am an engineering manager at Cisco Meraki,
00:02:07.880 probably the largest Rails shop you've never heard of.
00:02:14.319 And though I'm not a baker, I do know a thing or two about flakiness.
00:02:19.920 In fact, sometimes it's all I can think about.
00:02:26.160 Seriously, since I started automating tests over 20 years ago, I have written my fair share of flaky specs.
00:02:31.760 Daylight Saving Time is my personal nemesis. I can't tell you how many times I've tripped over it.
00:02:39.519 Let's just say I'm well into the 'shame on me' part of that relationship.
00:02:43.480 Or, I was, but I'm getting ahead of myself. Let's start with a definition: What is a flaky test?
00:02:54.560 A flaky spec is one that changes without modification to either the test itself or the code being tested.
00:03:01.160 So if you write a spec and a method, and it passes, then you should expect that as long as they don't change, it should continue to pass.
00:03:11.640 If the method stays the same, but the outcome changes, then you know you have a flaky spec.
00:03:19.000 But how does this happen? Well, it happens because of the secret ingredient that I mentioned.
00:03:25.400 All flaky tests have this in common. But what is it? It's an assumption.
00:03:31.760 All flaky tests make invalid assumptions about their environment.
00:03:38.560 They assume their environment will be in a particular state when the test begins, but that assumption is rendered incorrect by some change.
00:03:45.879 This change can occur during test runs.
00:03:51.280 So what causes that change to the environment? There are three recipes.
00:03:57.760 The first is non-determinism, the second is order dependence, and the third is race conditions.
00:04:06.439 Let's take a look at each one of these along with some examples in code, starting with non-determinism.
00:04:12.480 So, what is non-determinism? In fact, for that matter, what is determinism?
00:04:19.760 A deterministic algorithm is one that, given the same input, will always produce the same output.
00:04:26.160 For example, if I take these parameters and pass them to a method called 'add', they should always return two.
00:04:32.800 No matter how many times I call that method.
00:04:39.160 But what if there were a method, 'foo', that always returned true until it didn't?
00:04:44.360 That's the definition of non-determinism: an algorithm that, given the same inputs, does not always produce the same outputs.
00:04:50.280 Well, how could that be? It might sound obvious, but if you're utilizing a non-deterministic feature of the environment or the system, then you are producing non-deterministic code.
00:04:56.800 So the trick is: how do you make non-deterministic tests?
00:05:04.560 Here are some ways that non-determinism can sneak into your code. You can use random numbers, which are intended to be random.
00:05:12.560 You can also use the system clock, my nemesis.
00:05:19.800 You can use a network connection—sometimes they're up, sometimes they go down.
00:05:26.640 These tests might pass or fail. There are also floating point numbers.
00:05:33.280 The precision on floating point numbers isn’t guaranteed, so a test might pass once and then fail the next time.
00:05:39.840 These are just a few examples. I'm sure this list is not exhaustive.
00:05:46.160 But, what if our code relies on these things? How do we make the tests deterministic?
00:05:54.959 The trick is to try and remove the non-determinism by mocking it or stubbing it.
00:06:05.000 Or, to account for it by using some advanced RSpec matchers.
00:06:10.799 You can stub that random number generator to produce a specific number.
00:06:15.840 You can mock or freeze time. You can stub the network connection to return a specific response.
00:06:21.720 For floats, you can use advanced matchers like 'be_within' or 'be_between' instead of using an exact match.
00:06:26.800 And finally, please remember to document that undocumented thing that caused the test to fail.
00:06:30.560 In my case, I make sure to write the test after Daylight Saving Time to prove that it still works.
00:06:38.880 All right, while that build is running that we started at the beginning of the talk, let's see if we can fix some of those flaky specs.
00:06:48.399 We don't want to be late for the thing, after all.
00:06:55.080 So first, a bit of context: The code that I'm about to share is entirely made up.
00:07:02.960 Well, technically, I guess all code is made up, but what I mean is that this code was made up fresh just for this talk by me.
00:07:09.279 It's not production code, but it is inspired by code I've seen in production applications.
00:07:16.400 It's a bit of a hodgepodge; it's a class called 'RubyConf' that provides a few methods that the organizers might find helpful.
00:07:24.640 Here is a method to determine whether the conference is currently in progress.
00:07:33.040 I'll let you take a look at that for a second.
00:07:38.400 Notice that the method is using the system clock. I realized now that I'm using the wrong date.
00:07:44.800 It's simple enough, right? Let's look at the specs for this.
00:07:52.600 Oh, well, there's only one.
00:07:58.800 Okay, let's look at that spec.
00:08:05.000 It says that the 'in_progress' method returns false before the conference begins.
00:08:12.240 That seems like maybe the author forgot a couple of use cases. But this is something common that I see with date-based specs.
00:08:20.000 The author of the spec is living in the now; they're not thinking about the future.
00:08:27.480 Kind of like me with Daylight Saving Time.
00:08:34.680 We forget to write that spec that goes along with whatever happens after the clock changes.
00:08:40.800 So, I'll bet that this spec was passing before the conference, but now that the conference is happening, it's probably failing.
00:08:46.640 So let's play with the system clock and see if we can't figure that out.
00:08:51.400 All right, so as I predicted, it passed when I set the clock back in time to October, well before the conference.
00:08:58.000 Not at all the day that I wrote the slide.
00:09:05.600 So, we know this is a flaky test now because it was running and now it’s not.
00:09:11.600 If I set the clock forward to the beginning of the conference, it fails.
00:09:17.760 So how do we fix it? Remember, this is a non-deterministic issue.
00:09:23.600 We need to either mock or account for that non-determinism.
00:09:30.960 What I'm going to do in this case is freeze time.
00:09:35.480 Now, here's the code and the spec as they were before, and here's the difference.
00:09:41.760 This is the fixed spec, and you'll notice only the bottom half of the screen is changing.
00:09:47.440 The code is fine; there’s no problems with the code. This is only a problem in the spec.
00:09:53.200 What it’s doing is freezing time, setting the time to a specific date.
00:10:00.000 In this block, the code will only execute under that context.
00:10:07.000 As soon as the block ends, the system is sent back to its normal state.
00:10:13.920 Okay, let's see if that fixed it. All right, November 13th—the spec is now passing.
00:10:21.780 In fact, I even went back and added those other specs that were missing: the one for before and the one for during or after.
00:10:27.960 Those look like this. Notice they’re just doing the exact same thing.
00:10:34.640 They have a different date and a different outcome, but they’re basically the same test.
00:10:39.720 All right, so we fixed one of those flaky tests. Let's take a look at another one.
00:10:46.000 This one is about a network connection that goes down.
00:10:53.480 It's not uncommon for your code to need to talk to another service across a network.
00:11:01.760 In this session description method, we're calling an API endpoint to fetch the conference schedule.
00:11:08.360 Then, we're parsing it, finding the session that matches the title parameter, and returning a description.
00:11:15.000 Here's the spec. I know that's a lot of code to take in, so I'm going to take a second.
00:11:21.840 All right, so with Wi-Fi enabled, this spec passes.
00:11:28.449 Note that the call to the network might not pass here at the conference, but it does pass in this scenario.
00:11:35.560 And it adds over a second to the runtime, as you can see there.
00:11:42.400 A lot of times, HTTP calls that fail take 60 seconds to time out, which makes it harder to make these tests useful.
00:11:50.680 Plus, when I turn off Wi-Fi, the spec fails. Fortunately, it fails quickly.
00:11:58.200 The network failures are on my end, not the server end.
00:12:05.040 These are particularly nasty tests to debug because that loss in connectivity is not logged or persistent.
00:12:11.600 By the time you get around to debugging the failure, it might not be possible to reproduce without turning off your Wi-Fi.
00:12:18.160 So pay attention to HTTP calls in your code or any other type of call that crosses a network, like gRPC.
00:12:24.320 Try running the specs with Wi-Fi turned off—you'd be surprised how many things you catch that way.
00:12:31.360 Okay, let's fix this thing. Here it is: same exact code, same test as before.
00:12:36.960 A smaller font size. I'm not going to put as much code on the screen as Jeremy just did.
00:12:43.600 But here we go; that's the fix.
00:12:49.040 Now, again, the code at the top half of the screen is not changing.
00:12:55.920 The changes to the test are just setting up that stub for the network response.
00:13:01.680 This allows the spec to validate that we're parsing the results correctly.
00:13:07.920 And that’s really the code we care about; we don't actually care whether this external service is up and running.
00:13:14.160 It shouldn't matter.
00:13:20.080 On that point, I often get pushback. Folks will ask: 'What if the API changes?'
00:13:26.640 How are we going to know if the spec isn't going to fail because it's mocking the result?
00:13:31.960 My answer to that is that every spec should have one and only one reason to exist.
00:13:39.040 In this case, that reason to exist is to verify the code that we wrote works as we expect.
00:13:45.240 It's a unit test, and we want these to have as few dependencies as possible so they execute quickly.
00:13:52.160 Now, you may also require a spec to validate that an API schema hasn't changed.
00:14:01.240 That is a different reason for a spec to exist. It is not this spec.
00:14:08.760 In fact, it's not even a unit test; it's an integration test.
00:14:14.280 Because of that, it's even an integration test that's designed to fail.
00:14:20.160 We want it to fail when that schema changes.
00:14:26.760 So, it's not something we want to run alongside all of our unit tests, which are designed to pass every time.
00:14:34.000 Maybe the integration tests should be run separately on a schedule instead of among all of our unit tests.
00:14:39.760 All right, let's run the spec and see if it passes.
00:14:47.040 Sure enough, if I turn the Wi-Fi off, the spec still passes.
00:14:53.680 Now, because I'm mocking or stubbing the response to the network call.
00:14:59.560 It's also important to point out the difference in execution speed here. The live version took 1.3 seconds.
00:15:07.200 This one took one hundredth of a second, which can become incredibly important.
00:15:14.760 Especially as your test suite grows, particularly when it hits 100,000 specs, like we just did.
00:15:20.360 All right, it is 15 minutes in, and those specs took us about 10 minutes to fix.
00:15:27.760 That wasn't so bad. Let's see if we can solve some more of these before you have to head out.
00:15:34.160 All right, let's take a look at order dependence next, starting with a definition.
00:15:40.760 Order dependent specs are specs that pass in isolation but fail when run with other specs in a specific order.
00:15:46.960 For example, if both test A and test B pass when run in alphabetical order, but test A fails when run after test B, then that makes test A flaky and test B leaky.
00:15:54.960 What does that mean, leaky?
00:16:01.520 Remember, these specs are making an invalid assumption about the environment, and that environment includes all the shared state that they have access to.
00:16:09.919 It works kind of like this: let's pretend this blue square is the starting point for the shared environment.
00:16:17.440 Spec A runs and gets the blue square, and it passes. Spec A does not change the state, so Spec B runs in the same context, and it also passes.
00:16:25.040 Now let's imagine running them in the opposite order, where Spec B gets the blue square but adds or changes some state, turning it into a pink hexagon.
00:16:32.640 Now this state is what test A is going to run in. It didn't expect that, so it's going to fail.
00:16:39.760 So basically what happened is Spec B leaked some state, either deleting or adding or changing something that's in the environment and causing Spec A to fail.
00:16:46.440 Spec A was only failing because it was susceptible to that; it made that invalid assumption about its environment.
00:16:53.840 For this reason, we think of these tests as leaky tests.
00:17:01.600 And isn’t leakiness the problem here? Isn’t Spec B the problem?
00:17:08.600 Well, actually no; they are both to blame. The reason is that one is leaking while the other is susceptible to it.
00:17:16.160 Only one of them, though, is breaking your build, which means that you should focus on that one first.
00:17:23.160 Now, often you'll find that if you fix the broken spec, it will point to a broader solution that will solve the leakiness as well.
00:17:32.440 But how do you fix these dependent flaky tests?
00:17:40.160 First, let’s look at what causes this kind of failure.
00:17:46.760 Order dependent failures are caused by mutable state that is shared across specs.
00:17:54.640 This could be in the form of broadly scoped variables like a global or class variable.
00:18:02.560 It could be a database, a key-value store, a cache, or even the DOM if you're writing JavaScript specs.
00:18:09.720 So that's what causes it, but how do you reproduce these issues?
00:18:16.800 The first thing to do is eliminate non-determinism. Make sure it's not non-deterministic.
00:18:23.840 You can do that by repeatedly running the same spec in isolation.
00:18:30.560 Now, it might take a while, but eventually if it's non-determinism, it will fail.
00:18:37.600 Now, if you can't reproduce it that way, run all the specs that ran together with this spec.
00:18:44.440 Probably from a process on the build server locally and/or run them in random order.
00:18:52.880 Continue running them in random order until one of them fails.
00:19:00.960 Now, if the default order doesn't solve it, you can run RSpec with the option 'order random'.
00:19:08.080 Keep running them that way until you find a seed that consistently causes the failure.
00:19:14.080 Then the next thing you want to do is locate the leaky spec or specs.
00:19:20.360 I say specs plural, but it's possible that it takes multiple specs modifying the environment before the failure reoccurs.
00:19:26.840 Keep running it until you find that seed that causes the failure.
00:19:33.560 Once you've found it, you can use RSpec's 'bisect' feature to find the actual leaky specs.
00:19:41.560 I’ll show you how to do that in just a minute.
00:19:47.320 First, though, what do we need to do to fix these problems once we've found them?
00:19:54.680 You can remove the shared state. You can make it immutable or you can isolate it.
00:20:01.920 Don't use those broadly scoped variables—those are a recipe for flakiness.
00:20:08.840 Mock shared data stores with a layer of abstraction.
00:20:15.760 You can use database transactions, which, if you've used Database Cleaner, that’s what you're doing.
00:20:22.600 Or you can reset the shared state in between specs.
00:20:30.320 All right, let's see if we can solve another one of those flaky specs that’s keeping you here.
00:20:37.640 Let's look at a shared state example. Here we have a simple getter and setter to store a favorite session.
00:20:44.000 Notice that both the getter and setter leverage an object called 'cache'.
00:20:51.600 They call a method on it called 'instance'. What is that? Let's take a look.
00:20:58.720 So, the cache is a simple in-memory key-value pair backed by a hash.
00:21:05.680 The instance method is effectively turning it into a singleton.
00:21:11.680 So that every reference to the cache's instance method is returning the same instance.
00:21:18.240 Here are the specs for the favorite session getter and setter.
00:21:27.520 Now these specs pass when run in this order: getter first, then setter.
00:21:34.440 But they fail in the opposite order because we’re storing the title of the session in the cache.
00:21:41.180 If we run the setter first, we're going to set the cache, and then the getter will return the value in the cache.
00:21:47.760 So it will not see the default value, whereas the spec for the getter is looking for that default value.
00:21:55.040 To prove that, let's run this with RSpec in random order mode and see if we can’t reproduce the failure.
00:22:01.920 Here you can see I ran the spec with order random, and RSpec produced a random seed.
00:22:08.960 The getter ran before the setter with that seed, so it passed.
00:22:15.040 Let's try again.
00:22:21.160 Okay, so here I'm still running with order random, but RSpec chose a different seed.
00:22:29.439 And lo and behold, the setter ran first, so the getter failed.
00:22:35.920 So how do we go about fixing it now that we know?
00:22:41.920 Well, we know only that one of these three specs caused the failure.
00:22:48.920 They all ran before the getter spec, so any of them could have been the problem.
00:22:55.320 Now, we have a suspect; we think it’s the setter. But how do we know for sure?
00:23:01.920 This is where RSpec's bisect comes in, so let's take a look.
00:23:08.640 Here, I'm running RSpec bisect with the same order random clause and the same seed that produced the failure.
00:23:14.600 This is important because bisect won't work if you hand it a command that won't fail.
00:23:20.320 So I had to find the seed that caused the failure first.
00:23:27.040 The first thing bisect does is it tries to reproduce the failure, because if it can't, it'll just exit and say, 'I can't help you.'
00:23:34.120 Next, it analyzes whether that failure appears to be order-dependent, and in this case, it does.
00:23:40.320 You can see the failure appears to be order-dependent.
00:23:46.480 So it performs a binary search looking for the spec or specs that need to be run first in order for the failure to happen.
00:23:52.080 Now, note that this can take a very long time if that list of candidate specs is long.
00:23:57.920 Finally, if it finds a reproducible order in which the test fails,
00:24:02.480 it will give you the command to run to get that failure to reproduce and cause it to just be the necessary specs.
00:24:08.960 So if you run that command like this, you can see here I'm running exactly the command proposed.
00:24:15.120 Including 'order random', I added 'format documentation' here so we’d see the names of the tests.
00:24:21.680 And sure enough, the setter is the suspect.
00:24:28.640 We now know that’s the test we need to go address.
00:24:35.240 So how do we fix it?
00:24:40.480 Well, here we are back at the beginning again.
00:24:46.560 One way we could approach this would be to add a 'clear' method to the cache and call that method in between specs.
00:24:52.920 But because our specs are currently sharing state, if we ran them on the build server and cleared the cache in between specs,
00:25:01.280 running them in parallel, what would happen? One test would interfere with the execution of another.
00:25:07.920 So what I prefer in this situation is to use a technique called dependency injection.
00:25:14.720 Now, it's a simple technique with a weird sounding name.
00:25:21.760 All it does is we’re going to pass the cache object into the RubyConf object when we create it.
00:25:27.680 So each spec will be able to pass its own cache into the RubyConf object.
00:25:34.480 Here's what that looks like in code, starting with the implementation.
00:25:40.960 Notice that the cache parameter defaults to cache.instance.
00:25:47.120 This way, if we don't pass anything when we create the RubyConf object, it'll just use the singleton.
00:25:53.200 That's exactly what we want in production.
00:25:59.520 Now by doing this, though, we've created a seat in the software that allows the specs to use their own cache objects.
00:26:06.080 This prevents state from leaking between the specs without modifying the behavior of the production code.
00:26:13.920 All right, let's look at the specs. Here we are creating a new instance of the cache class and passing it to the RubyConf object.
00:26:21.600 That's it! That's all there is to dependency injection.
00:26:28.160 You create a parameter in the initializer and hand the object to the collaborator.
00:26:34.320 All right, let's run the specs again.
00:26:41.440 I'm running the specs with the same randomized seed that caused them to fail in the first place, and now they pass.
00:26:47.280 Voila! We’re making good progress—it’s only 4:30!
00:26:52.120 You might actually make that thing after all.
00:26:58.960 Let’s take a look at a couple more broken specs, but first, let's talk about race conditions.
00:27:05.360 So what is a race condition? A race condition can occur when two or more parallel processes are competing over the same scarce shared resource.
00:27:11.880 First, let’s look at two specs running in sequence to see how they work when writing to a file.
00:27:18.560 The first spec is going to write to the file, read from the file, check the result, and it’ll pass.
00:27:25.080 Now the second spec is going to write to the file two times, read from the file, and pass.
00:27:31.040 But if I run them in parallel, the first spec writes to the file, then the second spec writes to the file.
00:27:37.600 Then spec one reads, checks the result, and fails, because there are two records when it only expected one.
00:27:43.920 Next, the second spec writes again, reads, and fails, because there are now three records when it expected two.
00:27:50.080 In this case, both specs are susceptible to parallel flakiness due to a race condition.
00:27:56.880 Of course, this is synchronous code, so who knows—it might actually pass.
00:28:03.320 That’s why race conditions are notoriously hard to reproduce.
00:28:09.520 So how do we go about debugging them if we can't reproduce them?
00:28:16.080 You want to take a methodical approach like we did with the order-dependent specs.
00:28:24.960 The first thing is to try and eliminate non-determinism.
00:28:32.320 Run that failing spec in isolation repeatedly.
00:28:39.520 If it fails, then you know it’s non-determinism, and if not, try to eliminate order dependence.
00:28:46.320 Run it repeatedly with the specs that it failed with on the build server in different orders.
00:28:53.520 Now, if you can reproduce the failure that way, that’s an order-dependent problem.
00:29:01.680 You can use bisect to help you there.
00:29:08.720 Finally, if that doesn't work, run the specs repeatedly in parallel using the parallel RSpec gem.
00:29:14.800 I mention this gem specifically because other gems that help you with this seem targeted.
00:29:20.880 Like parallel tests or knapsack seem targeted at running Rails apps on their CI server.
00:29:27.440 It's best if you can debug this stuff locally, and parallel RSpec will let you do that.
00:29:34.400 If you still can't reproduce it, try adding 'order random' to the parallel RSpec.
00:29:41.760 That will give you the same parallel system, but it'll randomize the order the specs are in.
00:29:48.400 So once you’ve reproduced it, or even if you can't, what are you going to look for to try and fix these things?
00:29:56.560 The main cause, remember, is that race conditions are caused on the build server, usually by asynchronous code competing for shared resources.
00:30:03.760 Those things might include the file we just looked at, it could be a socket, a thread pool, or a connection pool.
00:30:10.960 Or even memory!
00:30:18.120 Once you have a suspect—once you know one of the things that it could be—how do you fix it?
00:30:24.840 Well, there are a lot of different ways; it depends on the situation.
00:30:31.520 If it's an IO-based issue, you can use a class called StringIO as a stand-in for the other type of IO you're trying to perform.
00:30:38.480 If it's a thread issue, you can try writing thread-safe code.
00:30:45.360 You can test the messages are passed correctly and not test the results coming out of the object.
00:30:52.400 Then if it's threads, you can write thread-safe code.
00:30:59.760 Of course, you could also extract that code out of the threaded class into a pool that you call from the thread.
00:31:06.800 Then test the pool synchronously, and trust that Ruby knows how to instantiate a thread and run code on a thread.
00:31:13.040 You can also switch to fibers instead of using threads.
00:31:20.720 Fibers are super cool! They let you test synchronously; you don't need to test on a thread.
00:31:26.600 They also make it incredibly hard to create a race condition.
00:31:32.640 Because one of the things they do is take control of the CPU and don’t hand it back until they’re done.
00:31:39.440 So any kind of atomic operation you need to perform that involves several commands can be done without having to account for thread safety.
00:31:46.240 Let’s take a look at how to fix one of these. Here are the last two flaky specs that are actually keeping you here.
00:31:52.960 This is a feature of the app that manages a list of reservations.
00:31:59.440 Now, there are two methods—one to reserve a seat, the other to get a list of attendees.
00:32:06.160 You can see that reserving a seat just writes to a file.
00:32:12.960 Getting the attendees back is just reading from that file.
00:32:18.560 So let's take a look at the specs.
00:32:25.760 The first spec ensures that writing Mickey Mouse to the file grows the number of attendees by one.
00:32:33.120 The second spec ensures that when writing multiple lines, Donald and Goofy, the attendee count goes up accordingly.
00:32:40.160 So, when I run them in sequence, they pass.
00:32:47.680 In fact, they'll even run in sequence if I switch the order.
00:32:53.840 But if I run them with Parallel RSpec, they will fail. In fact, both of them fail.
00:33:01.440 The second spec actually failed first because of parallelism.
00:33:08.640 The attendee count actually grew by three, not two, like I showed a minute ago.
00:33:15.520 And the first spec actually finished second, and it failed because the record count grew by two, not one.
00:33:22.080 We've already seen how that can happen, but let's walk through it again a little slower.
00:33:29.040 Let's look at how this RSpec code works—it's not as simple as most RSpec tests.
00:33:36.920 Here's the expectation for the second test; it has two blocks of code.
00:33:43.840 The first is passed into an expect method, the second is passed into the change method.
00:33:49.200 Now, when our specs execute, they're going to run this block of code first.
00:33:56.160 That is, it's going to check the attendee count before it runs the actual test.
00:34:01.360 Next, it's going to run the expectation block, and then it's going to run this block of code again to get the final count of records.
00:34:08.000 And then it'll check the delta between the initial and the final and compare it to what was passed into 'by'—in this case, two.
00:34:15.560 Now that we know how RSpec is going to execute this, let's watch how this spec actually processes.
00:34:23.440 First, the second spec reads the file and finds that there are zero records.
00:34:29.120 Next, the first write in the second spec executes, so now Donald is in the file.
00:34:36.960 Next, the first spec checks its initial value.
00:34:42.560 It finds a value of one record in the file, and both writes happen.
00:34:48.720 It doesn't matter what order they happen in; we could check the file if we really cared.
00:34:55.040 But they both happen, and then both reads happen.
00:35:02.080 We happen to know here that the second one happens first because that's the test that was in the output first.
00:35:07.040 But it's going to look at the value that was in the file—how many records.
00:35:14.360 Both of these tests are going to see three records.
00:35:20.920 Now the delta between zero and three is three, and the delta between one and three is two.
00:35:27.040 Neither of those match the 'by' clause, so both tests fail.
00:35:32.760 Now we know how it failed; let's go back to the beginning and show you the solution.
00:35:40.480 It turns out the Ruby Core team actually thought of this, bless their hearts.
00:35:46.560 They knew that testing asynchronous IO would be really hard, so they included a class called StringIO.
00:35:54.000 StringIO simulates other kinds of IO in specs. It’s a string but with the interface of a file.
00:36:01.160 What we want to do is allow 'File' to receive 'open' and yield a StringIO object.
00:36:08.480 What that does is mean that now when I call 'File.open', the actual object it will receive is a StringIO object.
00:36:15.840 One caveat with StringIO is that you need to rewind it before you can read it.
00:36:22.480 The reason for that is it's a file, essentially.
00:36:29.760 The reason we didn't have to do that previously was because 'File.open' was a block.
00:36:36.320 As soon as that block ended, the file fell out of scope and was closed automatically.
00:36:43.000 Now that we are defining this StringIO class in the test and using it in the code,
00:36:50.240 it doesn’t fall out of scope until the test ends, so we need to rewind it before we can read.
00:36:57.760 All right, proof is in the pudding. Let’s see if we fixed this race condition.
00:37:02.600 All right, we got it! Parallel RSpec proves they're both passing.
00:37:08.760 Okay, here we are 37 minutes into this talk and we’ve resolved all of the flakiness.
00:37:15.760 So it’s time to wrap things up real quick so we can get to that thing in the lunchroom.
00:37:24.160 Because I don't know about you, but this test always makes me hungry!
00:37:30.720 Here’s a cheat sheet for the entire talk.
00:37:37.760 First, non-deterministic flakiness reproduces in isolation.
00:37:43.160 Look for interactions with non-deterministic elements of the environment.
00:37:50.760 To fix this kind of flakiness, mock the non-determinism to make it deterministic.
00:37:55.840 Don't forget about TimeCop when working with date and time-related specs.
00:38:03.680 There are tools like WebMock and VCR for handling specs that require network connections.
00:38:09.040 But I prefer to use RSpec, like I did earlier, so there are plenty of folks who find these tools useful.
00:38:15.120 Next, order dependent flakiness only reproduces with other specs when run in a certain order.
00:38:21.760 Because of that, they will not reproduce in isolation.
00:38:30.560 Look for state that is shared across tests.
00:38:36.480 To fix order dependency, remove the shared state.
00:38:43.920 Make it immutable or isolated. RSpec's 'order random' can help you here by reproducing the failures.
00:38:50.400 And RSpec's bisect can help you locate the leaky spec that's causing the failure.
00:38:56.600 Finally, race conditions only reproduce with other specs when run in parallel, not in isolation.
00:39:03.760 Look for asynchronous code or exhaustible shared resources.
00:39:10.640 To fix race conditions, isolate things from one another.
00:39:17.200 Like we did with StringIO, or use fibers instead of threads.
00:39:23.680 Seriously, these things are amazing; you’ve got to try them.
00:39:29.280 Finally, you can use parallel RSpec to reproduce the failures locally instead of on your build server.
00:39:37.200 Now, keep in mind that original point: all of these specs have a common problem.
00:39:43.760 They're making an invalid assumption about the environment in which they're running.
00:39:50.760 Sometimes just remembering that fact will help you identify and resolve the flakiness.
00:39:57.760 Ask yourself: how can I ensure that this spec has the environment that it expects?
00:40:05.040 And one more thing—I have a bit of a hot take here.
00:40:12.080 Debugging this stuff is incredibly hard, but it gets a thousand times harder if your specs are too DRY.
00:40:18.080 So avoid using these features of RSpec. They seem harmless at first, even useful.
00:40:25.840 But ultimately they're going to make debugging way too hard.
00:40:34.880 So avoid shared specs, avoid shared contexts, avoid nested contexts.
00:40:39.920 Your specs should be incredibly communicative.
00:40:47.160 After all, they are the executable documentation for your code.
00:40:54.080 If you have to scroll all over the place or open a ton of files to debug it later, that's a problem.
00:41:01.840 Keep your tests wet!
00:41:08.080 I'm not the only one who says this; the fine folks at Thoughtbot agree.
00:41:13.520 They've written several articles on this, and honestly, DRY might be the worst programming advice ever.
00:41:19.280 You can read more about it in my articles.
00:41:26.000 I told you it was a hot take, but it sounds like I have some agreement here. That's awesome!
00:41:31.960 If you still disagree, come find me at lunch. I want to change your mind!
00:41:38.600 Again, my name is Alan Ridlehoover. I do know a thing or two about flakiness.
00:41:46.440 It took me over 20 years to get here. I hope this talk has short circuited that for some of you.
00:41:54.440 I work for Cisco Meraki, so I also know a thing or two about connectivity.
00:42:02.960 Here’s how to connect with me. That last item there is the source code for this talk, including the fixed code.
00:42:11.200 It's tagged so you can walk through it one thing at a time.
00:42:18.960 You can look at the failure, you can look at the success, or you can practice fixing the failures.
00:42:26.040 Cisco Meraki is probably the largest Rails shop you've never heard of, and we are growing.
00:42:34.200 So if you're interested, there are limited roles open right now. Come chat with us at the job fair.
00:42:42.280 Find out what it's like to work at Meraki.
00:42:50.360 Finally, a tiny bit of shameless self-promotion. I love Ruby so much.
00:42:56.520 My friend Fto and I often write code on the weekends just to release something into the wild in the hopes that somebody finds it useful.
00:43:10.680 You can find links to our stuff at firsttr.software, including Ruist.
00:43:14.920 Ruist is an opinionated VS code theme. It’s the one you saw in the talk.
Explore all talks recorded at RubyConf 2023
+34