The Secret Ingredient: How To Understand and Resolve Just About Any Flaky Test

00:00:18.480 All right, hi everyone! I'm Maple Ang, and I'm part of the program committee this year.

00:00:24.439 Um, who here loves flaky tests? Yes? I'm so glad. Thank you! You're in the right place.

00:00:32.360 I'm here to introduce Alan Ridlehoover.

00:00:37.440 I'm going to read a bio. I should have had this open. Alan is a passionate software engineer who loves Ruby.

00:00:45.120 He is an empathetic leader at Cisco Meraki, a relatable human, and a man of several hats.

00:00:50.680 Welcome, Alan!

00:01:16.880 It's after 4:00 PM, and that release was due an hour ago.

00:01:22.880 You've got less than an hour to leave, or you'll be late to that thing. You can feel the clock ticking.

00:01:31.720 Slack clicks to life. You check your messages.

00:01:37.280 The build failed again. You look at the build, you look at the clock, and you realize you do not have time for flakiness.

00:01:42.759 So, you rerun the build again: two builds, five failing specs.

00:01:51.560 None of them have anything to do with your commit. All you can think about is how you cannot be late to another thing.

00:01:56.840 If only you knew the secret ingredient that all flaky tests have in common, you might be on your way to that thing right now.

00:02:01.920 Hello! My name is Alan. I am an engineering manager at Cisco Meraki,

00:02:07.880 probably the largest Rails shop you've never heard of.

00:02:14.319 And though I'm not a baker, I do know a thing or two about flakiness.

00:02:19.920 In fact, sometimes it's all I can think about.

00:02:26.160 Seriously, since I started automating tests over 20 years ago, I have written my fair share of flaky specs.

00:02:31.760 Daylight Saving Time is my personal nemesis. I can't tell you how many times I've tripped over it.

00:02:39.519 Let's just say I'm well into the 'shame on me' part of that relationship.

00:02:43.480 Or, I was, but I'm getting ahead of myself. Let's start with a definition: What is a flaky test?

00:02:54.560 A flaky spec is one that changes without modification to either the test itself or the code being tested.

00:03:01.160 So if you write a spec and a method, and it passes, then you should expect that as long as they don't change, it should continue to pass.

00:03:11.640 If the method stays the same, but the outcome changes, then you know you have a flaky spec.

00:03:19.000 But how does this happen? Well, it happens because of the secret ingredient that I mentioned.

00:03:25.400 All flaky tests have this in common. But what is it? It's an assumption.

00:03:31.760 All flaky tests make invalid assumptions about their environment.

00:03:38.560 They assume their environment will be in a particular state when the test begins, but that assumption is rendered incorrect by some change.

00:03:45.879 This change can occur during test runs.

00:03:51.280 So what causes that change to the environment? There are three recipes.

00:03:57.760 The first is non-determinism, the second is order dependence, and the third is race conditions.

00:04:06.439 Let's take a look at each one of these along with some examples in code, starting with non-determinism.

00:04:12.480 So, what is non-determinism? In fact, for that matter, what is determinism?

00:04:19.760 A deterministic algorithm is one that, given the same input, will always produce the same output.

00:04:26.160 For example, if I take these parameters and pass them to a method called 'add', they should always return two.

00:04:32.800 No matter how many times I call that method.

00:04:39.160 But what if there were a method, 'foo', that always returned true until it didn't?

00:04:44.360 That's the definition of non-determinism: an algorithm that, given the same inputs, does not always produce the same outputs.

00:04:50.280 Well, how could that be? It might sound obvious, but if you're utilizing a non-deterministic feature of the environment or the system, then you are producing non-deterministic code.

00:04:56.800 So the trick is: how do you make non-deterministic tests?

00:05:04.560 Here are some ways that non-determinism can sneak into your code. You can use random numbers, which are intended to be random.

00:05:12.560 You can also use the system clock, my nemesis.

00:05:19.800 You can use a network connection—sometimes they're up, sometimes they go down.

00:05:26.640 These tests might pass or fail. There are also floating point numbers.

00:05:33.280 The precision on floating point numbers isn’t guaranteed, so a test might pass once and then fail the next time.

00:05:39.840 These are just a few examples. I'm sure this list is not exhaustive.

00:05:46.160 But, what if our code relies on these things? How do we make the tests deterministic?

00:05:54.959 The trick is to try and remove the non-determinism by mocking it or stubbing it.

00:06:05.000 Or, to account for it by using some advanced RSpec matchers.

00:06:10.799 You can stub that random number generator to produce a specific number.

00:06:15.840 You can mock or freeze time. You can stub the network connection to return a specific response.

00:06:21.720 For floats, you can use advanced matchers like 'be_within' or 'be_between' instead of using an exact match.

00:06:26.800 And finally, please remember to document that undocumented thing that caused the test to fail.

00:06:30.560 In my case, I make sure to write the test after Daylight Saving Time to prove that it still works.

00:06:38.880 All right, while that build is running that we started at the beginning of the talk, let's see if we can fix some of those flaky specs.

00:06:48.399 We don't want to be late for the thing, after all.

00:06:55.080 So first, a bit of context: The code that I'm about to share is entirely made up.

00:07:02.960 Well, technically, I guess all code is made up, but what I mean is that this code was made up fresh just for this talk by me.

00:07:09.279 It's not production code, but it is inspired by code I've seen in production applications.

00:07:16.400 It's a bit of a hodgepodge; it's a class called 'RubyConf' that provides a few methods that the organizers might find helpful.

00:07:24.640 Here is a method to determine whether the conference is currently in progress.

00:07:33.040 I'll let you take a look at that for a second.

00:07:38.400 Notice that the method is using the system clock. I realized now that I'm using the wrong date.

00:07:44.800 It's simple enough, right? Let's look at the specs for this.

00:07:52.600 Oh, well, there's only one.

00:07:58.800 Okay, let's look at that spec.

00:08:05.000 It says that the 'in_progress' method returns false before the conference begins.

00:08:12.240 That seems like maybe the author forgot a couple of use cases. But this is something common that I see with date-based specs.

00:08:20.000 The author of the spec is living in the now; they're not thinking about the future.

00:08:27.480 Kind of like me with Daylight Saving Time.

00:08:34.680 We forget to write that spec that goes along with whatever happens after the clock changes.

00:08:40.800 So, I'll bet that this spec was passing before the conference, but now that the conference is happening, it's probably failing.

00:08:46.640 So let's play with the system clock and see if we can't figure that out.

00:08:51.400 All right, so as I predicted, it passed when I set the clock back in time to October, well before the conference.

00:08:58.000 Not at all the day that I wrote the slide.

00:09:05.600 So, we know this is a flaky test now because it was running and now it’s not.

00:09:11.600 If I set the clock forward to the beginning of the conference, it fails.

00:09:17.760 So how do we fix it? Remember, this is a non-deterministic issue.

00:09:23.600 We need to either mock or account for that non-determinism.

00:09:30.960 What I'm going to do in this case is freeze time.

00:09:35.480 Now, here's the code and the spec as they were before, and here's the difference.

00:09:41.760 This is the fixed spec, and you'll notice only the bottom half of the screen is changing.

00:09:47.440 The code is fine; there’s no problems with the code. This is only a problem in the spec.

00:09:53.200 What it’s doing is freezing time, setting the time to a specific date.

00:10:00.000 In this block, the code will only execute under that context.

00:10:07.000 As soon as the block ends, the system is sent back to its normal state.

00:10:13.920 Okay, let's see if that fixed it. All right, November 13th—the spec is now passing.

00:10:21.780 In fact, I even went back and added those other specs that were missing: the one for before and the one for during or after.

00:10:27.960 Those look like this. Notice they’re just doing the exact same thing.

00:10:34.640 They have a different date and a different outcome, but they’re basically the same test.

00:10:39.720 All right, so we fixed one of those flaky tests. Let's take a look at another one.

00:10:46.000 This one is about a network connection that goes down.

00:10:53.480 It's not uncommon for your code to need to talk to another service across a network.

00:11:01.760 In this session description method, we're calling an API endpoint to fetch the conference schedule.

00:11:08.360 Then, we're parsing it, finding the session that matches the title parameter, and returning a description.

00:11:15.000 Here's the spec. I know that's a lot of code to take in, so I'm going to take a second.

00:11:21.840 All right, so with Wi-Fi enabled, this spec passes.

00:11:28.449 Note that the call to the network might not pass here at the conference, but it does pass in this scenario.

00:11:35.560 And it adds over a second to the runtime, as you can see there.

00:11:42.400 A lot of times, HTTP calls that fail take 60 seconds to time out, which makes it harder to make these tests useful.

00:11:50.680 Plus, when I turn off Wi-Fi, the spec fails. Fortunately, it fails quickly.

00:11:58.200 The network failures are on my end, not the server end.

00:12:05.040 These are particularly nasty tests to debug because that loss in connectivity is not logged or persistent.

00:12:11.600 By the time you get around to debugging the failure, it might not be possible to reproduce without turning off your Wi-Fi.

00:12:18.160 So pay attention to HTTP calls in your code or any other type of call that crosses a network, like gRPC.

00:12:24.320 Try running the specs with Wi-Fi turned off—you'd be surprised how many things you catch that way.

00:12:31.360 Okay, let's fix this thing. Here it is: same exact code, same test as before.

00:12:36.960 A smaller font size. I'm not going to put as much code on the screen as Jeremy just did.

00:12:43.600 But here we go; that's the fix.

00:12:49.040 Now, again, the code at the top half of the screen is not changing.

00:12:55.920 The changes to the test are just setting up that stub for the network response.

00:13:01.680 This allows the spec to validate that we're parsing the results correctly.

00:13:07.920 And that’s really the code we care about; we don't actually care whether this external service is up and running.

00:13:14.160 It shouldn't matter.

00:13:20.080 On that point, I often get pushback. Folks will ask: 'What if the API changes?'

00:13:26.640 How are we going to know if the spec isn't going to fail because it's mocking the result?

00:13:31.960 My answer to that is that every spec should have one and only one reason to exist.

00:13:39.040 In this case, that reason to exist is to verify the code that we wrote works as we expect.

00:13:45.240 It's a unit test, and we want these to have as few dependencies as possible so they execute quickly.

00:13:52.160 Now, you may also require a spec to validate that an API schema hasn't changed.

00:14:01.240 That is a different reason for a spec to exist. It is not this spec.

00:14:08.760 In fact, it's not even a unit test; it's an integration test.

00:14:14.280 Because of that, it's even an integration test that's designed to fail.

00:14:20.160 We want it to fail when that schema changes.

00:14:26.760 So, it's not something we want to run alongside all of our unit tests, which are designed to pass every time.

00:14:34.000 Maybe the integration tests should be run separately on a schedule instead of among all of our unit tests.

00:14:39.760 All right, let's run the spec and see if it passes.

00:14:47.040 Sure enough, if I turn the Wi-Fi off, the spec still passes.

00:14:53.680 Now, because I'm mocking or stubbing the response to the network call.

00:14:59.560 It's also important to point out the difference in execution speed here. The live version took 1.3 seconds.

00:15:07.200 This one took one hundredth of a second, which can become incredibly important.

00:15:14.760 Especially as your test suite grows, particularly when it hits 100,000 specs, like we just did.

00:15:20.360 All right, it is 15 minutes in, and those specs took us about 10 minutes to fix.

00:15:27.760 That wasn't so bad. Let's see if we can solve some more of these before you have to head out.

00:15:34.160 All right, let's take a look at order dependence next, starting with a definition.

00:15:40.760 Order dependent specs are specs that pass in isolation but fail when run with other specs in a specific order.

00:15:46.960 For example, if both test A and test B pass when run in alphabetical order, but test A fails when run after test B, then that makes test A flaky and test B leaky.

00:15:54.960 What does that mean, leaky?

00:16:01.520 Remember, these specs are making an invalid assumption about the environment, and that environment includes all the shared state that they have access to.

00:16:09.919 It works kind of like this: let's pretend this blue square is the starting point for the shared environment.

00:16:17.440 Spec A runs and gets the blue square, and it passes. Spec A does not change the state, so Spec B runs in the same context, and it also passes.

00:16:25.040 Now let's imagine running them in the opposite order, where Spec B gets the blue square but adds or changes some state, turning it into a pink hexagon.

00:16:32.640 Now this state is what test A is going to run in. It didn't expect that, so it's going to fail.

00:16:39.760 So basically what happened is Spec B leaked some state, either deleting or adding or changing something that's in the environment and causing Spec A to fail.

00:16:46.440 Spec A was only failing because it was susceptible to that; it made that invalid assumption about its environment.

00:16:53.840 For this reason, we think of these tests as leaky tests.

00:17:01.600 And isn’t leakiness the problem here? Isn’t Spec B the problem?

00:17:08.600 Well, actually no; they are both to blame. The reason is that one is leaking while the other is susceptible to it.

00:17:16.160 Only one of them, though, is breaking your build, which means that you should focus on that one first.

00:17:23.160 Now, often you'll find that if you fix the broken spec, it will point to a broader solution that will solve the leakiness as well.

00:17:32.440 But how do you fix these dependent flaky tests?

00:17:40.160 First, let’s look at what causes this kind of failure.

00:17:46.760 Order dependent failures are caused by mutable state that is shared across specs.

00:17:54.640 This could be in the form of broadly scoped variables like a global or class variable.

00:18:02.560 It could be a database, a key-value store, a cache, or even the DOM if you're writing JavaScript specs.

00:18:09.720 So that's what causes it, but how do you reproduce these issues?

00:18:16.800 The first thing to do is eliminate non-determinism. Make sure it's not non-deterministic.

00:18:23.840 You can do that by repeatedly running the same spec in isolation.

00:18:30.560 Now, it might take a while, but eventually if it's non-determinism, it will fail.

00:18:37.600 Now, if you can't reproduce it that way, run all the specs that ran together with this spec.

00:18:44.440 Probably from a process on the build server locally and/or run them in random order.

00:18:52.880 Continue running them in random order until one of them fails.

00:19:00.960 Now, if the default order doesn't solve it, you can run RSpec with the option 'order random'.

00:19:08.080 Keep running them that way until you find a seed that consistently causes the failure.

00:19:14.080 Then the next thing you want to do is locate the leaky spec or specs.

00:19:20.360 I say specs plural, but it's possible that it takes multiple specs modifying the environment before the failure reoccurs.

00:19:26.840 Keep running it until you find that seed that causes the failure.

00:19:33.560 Once you've found it, you can use RSpec's 'bisect' feature to find the actual leaky specs.

00:19:41.560 I’ll show you how to do that in just a minute.

00:19:47.320 First, though, what do we need to do to fix these problems once we've found them?

00:19:54.680 You can remove the shared state. You can make it immutable or you can isolate it.

00:20:01.920 Don't use those broadly scoped variables—those are a recipe for flakiness.

00:20:08.840 Mock shared data stores with a layer of abstraction.

00:20:15.760 You can use database transactions, which, if you've used Database Cleaner, that’s what you're doing.

00:20:22.600 Or you can reset the shared state in between specs.

00:20:30.320 All right, let's see if we can solve another one of those flaky specs that’s keeping you here.

00:20:37.640 Let's look at a shared state example. Here we have a simple getter and setter to store a favorite session.

00:20:44.000 Notice that both the getter and setter leverage an object called 'cache'.

00:20:51.600 They call a method on it called 'instance'. What is that? Let's take a look.

00:20:58.720 So, the cache is a simple in-memory key-value pair backed by a hash.

00:21:05.680 The instance method is effectively turning it into a singleton.

00:21:11.680 So that every reference to the cache's instance method is returning the same instance.

00:21:18.240 Here are the specs for the favorite session getter and setter.

00:21:27.520 Now these specs pass when run in this order: getter first, then setter.

00:21:34.440 But they fail in the opposite order because we’re storing the title of the session in the cache.

00:21:41.180 If we run the setter first, we're going to set the cache, and then the getter will return the value in the cache.

00:21:47.760 So it will not see the default value, whereas the spec for the getter is looking for that default value.

00:21:55.040 To prove that, let's run this with RSpec in random order mode and see if we can’t reproduce the failure.

00:22:01.920 Here you can see I ran the spec with order random, and RSpec produced a random seed.

00:22:08.960 The getter ran before the setter with that seed, so it passed.

00:22:15.040 Let's try again.

00:22:21.160 Okay, so here I'm still running with order random, but RSpec chose a different seed.

00:22:29.439 And lo and behold, the setter ran first, so the getter failed.

00:22:35.920 So how do we go about fixing it now that we know?

00:22:41.920 Well, we know only that one of these three specs caused the failure.

00:22:48.920 They all ran before the getter spec, so any of them could have been the problem.

00:22:55.320 Now, we have a suspect; we think it’s the setter. But how do we know for sure?

00:23:01.920 This is where RSpec's bisect comes in, so let's take a look.

00:23:08.640 Here, I'm running RSpec bisect with the same order random clause and the same seed that produced the failure.

00:23:14.600 This is important because bisect won't work if you hand it a command that won't fail.

00:23:20.320 So I had to find the seed that caused the failure first.

00:23:27.040 The first thing bisect does is it tries to reproduce the failure, because if it can't, it'll just exit and say, 'I can't help you.'

00:23:34.120 Next, it analyzes whether that failure appears to be order-dependent, and in this case, it does.

00:23:40.320 You can see the failure appears to be order-dependent.

00:23:46.480 So it performs a binary search looking for the spec or specs that need to be run first in order for the failure to happen.

00:23:52.080 Now, note that this can take a very long time if that list of candidate specs is long.

00:23:57.920 Finally, if it finds a reproducible order in which the test fails,

00:24:02.480 it will give you the command to run to get that failure to reproduce and cause it to just be the necessary specs.

00:24:08.960 So if you run that command like this, you can see here I'm running exactly the command proposed.

00:24:15.120 Including 'order random', I added 'format documentation' here so we’d see the names of the tests.

00:24:21.680 And sure enough, the setter is the suspect.

00:24:28.640 We now know that’s the test we need to go address.

00:24:35.240 So how do we fix it?

00:24:40.480 Well, here we are back at the beginning again.

00:24:46.560 One way we could approach this would be to add a 'clear' method to the cache and call that method in between specs.

00:24:52.920 But because our specs are currently sharing state, if we ran them on the build server and cleared the cache in between specs,

00:25:01.280 running them in parallel, what would happen? One test would interfere with the execution of another.

00:25:07.920 So what I prefer in this situation is to use a technique called dependency injection.

00:25:14.720 Now, it's a simple technique with a weird sounding name.

00:25:21.760 All it does is we’re going to pass the cache object into the RubyConf object when we create it.

00:25:27.680 So each spec will be able to pass its own cache into the RubyConf object.

00:25:34.480 Here's what that looks like in code, starting with the implementation.

00:25:40.960 Notice that the cache parameter defaults to cache.instance.

00:25:47.120 This way, if we don't pass anything when we create the RubyConf object, it'll just use the singleton.

00:25:53.200 That's exactly what we want in production.

00:25:59.520 Now by doing this, though, we've created a seat in the software that allows the specs to use their own cache objects.

00:26:06.080 This prevents state from leaking between the specs without modifying the behavior of the production code.

00:26:13.920 All right, let's look at the specs. Here we are creating a new instance of the cache class and passing it to the RubyConf object.

00:26:21.600 That's it! That's all there is to dependency injection.

00:26:28.160 You create a parameter in the initializer and hand the object to the collaborator.

00:26:34.320 All right, let's run the specs again.

00:26:41.440 I'm running the specs with the same randomized seed that caused them to fail in the first place, and now they pass.

00:26:47.280 Voila! We’re making good progress—it’s only 4:30!

00:26:52.120 You might actually make that thing after all.

00:26:58.960 Let’s take a look at a couple more broken specs, but first, let's talk about race conditions.

00:27:05.360 So what is a race condition? A race condition can occur when two or more parallel processes are competing over the same scarce shared resource.

00:27:11.880 First, let’s look at two specs running in sequence to see how they work when writing to a file.

00:27:18.560 The first spec is going to write to the file, read from the file, check the result, and it’ll pass.

00:27:25.080 Now the second spec is going to write to the file two times, read from the file, and pass.

00:27:31.040 But if I run them in parallel, the first spec writes to the file, then the second spec writes to the file.

00:27:37.600 Then spec one reads, checks the result, and fails, because there are two records when it only expected one.

00:27:43.920 Next, the second spec writes again, reads, and fails, because there are now three records when it expected two.

00:27:50.080 In this case, both specs are susceptible to parallel flakiness due to a race condition.

00:27:56.880 Of course, this is synchronous code, so who knows—it might actually pass.

00:28:03.320 That’s why race conditions are notoriously hard to reproduce.

00:28:09.520 So how do we go about debugging them if we can't reproduce them?

00:28:16.080 You want to take a methodical approach like we did with the order-dependent specs.

00:28:24.960 The first thing is to try and eliminate non-determinism.

00:28:32.320 Run that failing spec in isolation repeatedly.

00:28:39.520 If it fails, then you know it’s non-determinism, and if not, try to eliminate order dependence.

00:28:46.320 Run it repeatedly with the specs that it failed with on the build server in different orders.

00:28:53.520 Now, if you can reproduce the failure that way, that’s an order-dependent problem.

00:29:01.680 You can use bisect to help you there.

00:29:08.720 Finally, if that doesn't work, run the specs repeatedly in parallel using the parallel RSpec gem.

00:29:14.800 I mention this gem specifically because other gems that help you with this seem targeted.

00:29:20.880 Like parallel tests or knapsack seem targeted at running Rails apps on their CI server.

00:29:27.440 It's best if you can debug this stuff locally, and parallel RSpec will let you do that.

00:29:34.400 If you still can't reproduce it, try adding 'order random' to the parallel RSpec.

00:29:41.760 That will give you the same parallel system, but it'll randomize the order the specs are in.

00:29:48.400 So once you’ve reproduced it, or even if you can't, what are you going to look for to try and fix these things?

00:29:56.560 The main cause, remember, is that race conditions are caused on the build server, usually by asynchronous code competing for shared resources.

00:30:03.760 Those things might include the file we just looked at, it could be a socket, a thread pool, or a connection pool.

00:30:10.960 Or even memory!

00:30:18.120 Once you have a suspect—once you know one of the things that it could be—how do you fix it?

00:30:24.840 Well, there are a lot of different ways; it depends on the situation.

00:30:31.520 If it's an IO-based issue, you can use a class called StringIO as a stand-in for the other type of IO you're trying to perform.

00:30:38.480 If it's a thread issue, you can try writing thread-safe code.

00:30:45.360 You can test the messages are passed correctly and not test the results coming out of the object.

00:30:52.400 Then if it's threads, you can write thread-safe code.

00:30:59.760 Of course, you could also extract that code out of the threaded class into a pool that you call from the thread.

00:31:06.800 Then test the pool synchronously, and trust that Ruby knows how to instantiate a thread and run code on a thread.

00:31:13.040 You can also switch to fibers instead of using threads.

00:31:20.720 Fibers are super cool! They let you test synchronously; you don't need to test on a thread.

00:31:26.600 They also make it incredibly hard to create a race condition.

00:31:32.640 Because one of the things they do is take control of the CPU and don’t hand it back until they’re done.

00:31:39.440 So any kind of atomic operation you need to perform that involves several commands can be done without having to account for thread safety.

00:31:46.240 Let’s take a look at how to fix one of these. Here are the last two flaky specs that are actually keeping you here.

00:31:52.960 This is a feature of the app that manages a list of reservations.

00:31:59.440 Now, there are two methods—one to reserve a seat, the other to get a list of attendees.

00:32:06.160 You can see that reserving a seat just writes to a file.

00:32:12.960 Getting the attendees back is just reading from that file.

00:32:18.560 So let's take a look at the specs.

00:32:25.760 The first spec ensures that writing Mickey Mouse to the file grows the number of attendees by one.

00:32:33.120 The second spec ensures that when writing multiple lines, Donald and Goofy, the attendee count goes up accordingly.

00:32:40.160 So, when I run them in sequence, they pass.

00:32:47.680 In fact, they'll even run in sequence if I switch the order.

00:32:53.840 But if I run them with Parallel RSpec, they will fail. In fact, both of them fail.

00:33:01.440 The second spec actually failed first because of parallelism.

00:33:08.640 The attendee count actually grew by three, not two, like I showed a minute ago.

00:33:15.520 And the first spec actually finished second, and it failed because the record count grew by two, not one.

00:33:22.080 We've already seen how that can happen, but let's walk through it again a little slower.

00:33:29.040 Let's look at how this RSpec code works—it's not as simple as most RSpec tests.

00:33:36.920 Here's the expectation for the second test; it has two blocks of code.

00:33:43.840 The first is passed into an expect method, the second is passed into the change method.

00:33:49.200 Now, when our specs execute, they're going to run this block of code first.

00:33:56.160 That is, it's going to check the attendee count before it runs the actual test.

00:34:01.360 Next, it's going to run the expectation block, and then it's going to run this block of code again to get the final count of records.

00:34:08.000 And then it'll check the delta between the initial and the final and compare it to what was passed into 'by'—in this case, two.

00:34:15.560 Now that we know how RSpec is going to execute this, let's watch how this spec actually processes.

00:34:23.440 First, the second spec reads the file and finds that there are zero records.

00:34:29.120 Next, the first write in the second spec executes, so now Donald is in the file.

00:34:36.960 Next, the first spec checks its initial value.

00:34:42.560 It finds a value of one record in the file, and both writes happen.

00:34:48.720 It doesn't matter what order they happen in; we could check the file if we really cared.

00:34:55.040 But they both happen, and then both reads happen.

00:35:02.080 We happen to know here that the second one happens first because that's the test that was in the output first.

00:35:07.040 But it's going to look at the value that was in the file—how many records.

00:35:14.360 Both of these tests are going to see three records.

00:35:20.920 Now the delta between zero and three is three, and the delta between one and three is two.

00:35:27.040 Neither of those match the 'by' clause, so both tests fail.

00:35:32.760 Now we know how it failed; let's go back to the beginning and show you the solution.

00:35:40.480 It turns out the Ruby Core team actually thought of this, bless their hearts.

00:35:46.560 They knew that testing asynchronous IO would be really hard, so they included a class called StringIO.

00:35:54.000 StringIO simulates other kinds of IO in specs. It’s a string but with the interface of a file.

00:36:01.160 What we want to do is allow 'File' to receive 'open' and yield a StringIO object.

00:36:08.480 What that does is mean that now when I call 'File.open', the actual object it will receive is a StringIO object.

00:36:15.840 One caveat with StringIO is that you need to rewind it before you can read it.

00:36:22.480 The reason for that is it's a file, essentially.

00:36:29.760 The reason we didn't have to do that previously was because 'File.open' was a block.

00:36:36.320 As soon as that block ended, the file fell out of scope and was closed automatically.

00:36:43.000 Now that we are defining this StringIO class in the test and using it in the code,

00:36:50.240 it doesn’t fall out of scope until the test ends, so we need to rewind it before we can read.

00:36:57.760 All right, proof is in the pudding. Let’s see if we fixed this race condition.

00:37:02.600 All right, we got it! Parallel RSpec proves they're both passing.

00:37:08.760 Okay, here we are 37 minutes into this talk and we’ve resolved all of the flakiness.

00:37:15.760 So it’s time to wrap things up real quick so we can get to that thing in the lunchroom.

00:37:24.160 Because I don't know about you, but this test always makes me hungry!

00:37:30.720 Here’s a cheat sheet for the entire talk.

00:37:37.760 First, non-deterministic flakiness reproduces in isolation.

00:37:43.160 Look for interactions with non-deterministic elements of the environment.

00:37:50.760 To fix this kind of flakiness, mock the non-determinism to make it deterministic.

00:37:55.840 Don't forget about TimeCop when working with date and time-related specs.

00:38:03.680 There are tools like WebMock and VCR for handling specs that require network connections.

00:38:09.040 But I prefer to use RSpec, like I did earlier, so there are plenty of folks who find these tools useful.

00:38:15.120 Next, order dependent flakiness only reproduces with other specs when run in a certain order.

00:38:21.760 Because of that, they will not reproduce in isolation.

00:38:30.560 Look for state that is shared across tests.

00:38:36.480 To fix order dependency, remove the shared state.

00:38:43.920 Make it immutable or isolated. RSpec's 'order random' can help you here by reproducing the failures.

00:38:50.400 And RSpec's bisect can help you locate the leaky spec that's causing the failure.

00:38:56.600 Finally, race conditions only reproduce with other specs when run in parallel, not in isolation.

00:39:03.760 Look for asynchronous code or exhaustible shared resources.

00:39:10.640 To fix race conditions, isolate things from one another.

00:39:17.200 Like we did with StringIO, or use fibers instead of threads.

00:39:23.680 Seriously, these things are amazing; you’ve got to try them.

00:39:29.280 Finally, you can use parallel RSpec to reproduce the failures locally instead of on your build server.

00:39:37.200 Now, keep in mind that original point: all of these specs have a common problem.

00:39:43.760 They're making an invalid assumption about the environment in which they're running.

00:39:50.760 Sometimes just remembering that fact will help you identify and resolve the flakiness.

00:39:57.760 Ask yourself: how can I ensure that this spec has the environment that it expects?

00:40:05.040 And one more thing—I have a bit of a hot take here.

00:40:12.080 Debugging this stuff is incredibly hard, but it gets a thousand times harder if your specs are too DRY.

00:40:18.080 So avoid using these features of RSpec. They seem harmless at first, even useful.

00:40:25.840 But ultimately they're going to make debugging way too hard.

00:40:34.880 So avoid shared specs, avoid shared contexts, avoid nested contexts.

00:40:39.920 Your specs should be incredibly communicative.

00:40:47.160 After all, they are the executable documentation for your code.

00:40:54.080 If you have to scroll all over the place or open a ton of files to debug it later, that's a problem.

00:41:01.840 Keep your tests wet!

00:41:08.080 I'm not the only one who says this; the fine folks at Thoughtbot agree.

00:41:13.520 They've written several articles on this, and honestly, DRY might be the worst programming advice ever.

00:41:19.280 You can read more about it in my articles.

00:41:26.000 I told you it was a hot take, but it sounds like I have some agreement here. That's awesome!

00:41:31.960 If you still disagree, come find me at lunch. I want to change your mind!

00:41:38.600 Again, my name is Alan Ridlehoover. I do know a thing or two about flakiness.

00:41:46.440 It took me over 20 years to get here. I hope this talk has short circuited that for some of you.

00:41:54.440 I work for Cisco Meraki, so I also know a thing or two about connectivity.

00:42:02.960 Here’s how to connect with me. That last item there is the source code for this talk, including the fixed code.

00:42:11.200 It's tagged so you can walk through it one thing at a time.

00:42:18.960 You can look at the failure, you can look at the success, or you can practice fixing the failures.

00:42:26.040 Cisco Meraki is probably the largest Rails shop you've never heard of, and we are growing.

00:42:34.200 So if you're interested, there are limited roles open right now. Come chat with us at the job fair.

00:42:42.280 Find out what it's like to work at Meraki.

00:42:50.360 Finally, a tiny bit of shameless self-promotion. I love Ruby so much.

00:42:56.520 My friend Fto and I often write code on the weekends just to release something into the wild in the hopes that somebody finds it useful.

00:43:10.680 You can find links to our stuff at firsttr.software, including Ruist.

00:43:14.920 Ruist is an opinionated VS code theme. It’s the one you saw in the talk.