00:00:18.480
All right, hi everyone! I'm Maple Ang, and I'm part of the program committee this year.
00:00:24.439
Um, who here loves flaky tests? Yes? I'm so glad. Thank you! You're in the right place.
00:00:32.360
I'm here to introduce Alan Ridlehoover.
00:00:37.440
I'm going to read a bio. I should have had this open. Alan is a passionate software engineer who loves Ruby.
00:00:45.120
He is an empathetic leader at Cisco Meraki, a relatable human, and a man of several hats.
00:00:50.680
Welcome, Alan!
00:01:16.880
It's after 4:00 PM, and that release was due an hour ago.
00:01:22.880
You've got less than an hour to leave, or you'll be late to that thing. You can feel the clock ticking.
00:01:31.720
Slack clicks to life. You check your messages.
00:01:37.280
The build failed again. You look at the build, you look at the clock, and you realize you do not have time for flakiness.
00:01:42.759
So, you rerun the build again: two builds, five failing specs.
00:01:51.560
None of them have anything to do with your commit. All you can think about is how you cannot be late to another thing.
00:01:56.840
If only you knew the secret ingredient that all flaky tests have in common, you might be on your way to that thing right now.
00:02:01.920
Hello! My name is Alan. I am an engineering manager at Cisco Meraki,
00:02:07.880
probably the largest Rails shop you've never heard of.
00:02:14.319
And though I'm not a baker, I do know a thing or two about flakiness.
00:02:19.920
In fact, sometimes it's all I can think about.
00:02:26.160
Seriously, since I started automating tests over 20 years ago, I have written my fair share of flaky specs.
00:02:31.760
Daylight Saving Time is my personal nemesis. I can't tell you how many times I've tripped over it.
00:02:39.519
Let's just say I'm well into the 'shame on me' part of that relationship.
00:02:43.480
Or, I was, but I'm getting ahead of myself. Let's start with a definition: What is a flaky test?
00:02:54.560
A flaky spec is one that changes without modification to either the test itself or the code being tested.
00:03:01.160
So if you write a spec and a method, and it passes, then you should expect that as long as they don't change, it should continue to pass.
00:03:11.640
If the method stays the same, but the outcome changes, then you know you have a flaky spec.
00:03:19.000
But how does this happen? Well, it happens because of the secret ingredient that I mentioned.
00:03:25.400
All flaky tests have this in common. But what is it? It's an assumption.
00:03:31.760
All flaky tests make invalid assumptions about their environment.
00:03:38.560
They assume their environment will be in a particular state when the test begins, but that assumption is rendered incorrect by some change.
00:03:45.879
This change can occur during test runs.
00:03:51.280
So what causes that change to the environment? There are three recipes.
00:03:57.760
The first is non-determinism, the second is order dependence, and the third is race conditions.
00:04:06.439
Let's take a look at each one of these along with some examples in code, starting with non-determinism.
00:04:12.480
So, what is non-determinism? In fact, for that matter, what is determinism?
00:04:19.760
A deterministic algorithm is one that, given the same input, will always produce the same output.
00:04:26.160
For example, if I take these parameters and pass them to a method called 'add', they should always return two.
00:04:32.800
No matter how many times I call that method.
00:04:39.160
But what if there were a method, 'foo', that always returned true until it didn't?
00:04:44.360
That's the definition of non-determinism: an algorithm that, given the same inputs, does not always produce the same outputs.
00:04:50.280
Well, how could that be? It might sound obvious, but if you're utilizing a non-deterministic feature of the environment or the system, then you are producing non-deterministic code.
00:04:56.800
So the trick is: how do you make non-deterministic tests?
00:05:04.560
Here are some ways that non-determinism can sneak into your code. You can use random numbers, which are intended to be random.
00:05:12.560
You can also use the system clock, my nemesis.
00:05:19.800
You can use a network connection—sometimes they're up, sometimes they go down.
00:05:26.640
These tests might pass or fail. There are also floating point numbers.
00:05:33.280
The precision on floating point numbers isn’t guaranteed, so a test might pass once and then fail the next time.
00:05:39.840
These are just a few examples. I'm sure this list is not exhaustive.
00:05:46.160
But, what if our code relies on these things? How do we make the tests deterministic?
00:05:54.959
The trick is to try and remove the non-determinism by mocking it or stubbing it.
00:06:05.000
Or, to account for it by using some advanced RSpec matchers.
00:06:10.799
You can stub that random number generator to produce a specific number.
00:06:15.840
You can mock or freeze time. You can stub the network connection to return a specific response.
00:06:21.720
For floats, you can use advanced matchers like 'be_within' or 'be_between' instead of using an exact match.
00:06:26.800
And finally, please remember to document that undocumented thing that caused the test to fail.
00:06:30.560
In my case, I make sure to write the test after Daylight Saving Time to prove that it still works.
00:06:38.880
All right, while that build is running that we started at the beginning of the talk, let's see if we can fix some of those flaky specs.
00:06:48.399
We don't want to be late for the thing, after all.
00:06:55.080
So first, a bit of context: The code that I'm about to share is entirely made up.
00:07:02.960
Well, technically, I guess all code is made up, but what I mean is that this code was made up fresh just for this talk by me.
00:07:09.279
It's not production code, but it is inspired by code I've seen in production applications.
00:07:16.400
It's a bit of a hodgepodge; it's a class called 'RubyConf' that provides a few methods that the organizers might find helpful.
00:07:24.640
Here is a method to determine whether the conference is currently in progress.
00:07:33.040
I'll let you take a look at that for a second.
00:07:38.400
Notice that the method is using the system clock. I realized now that I'm using the wrong date.
00:07:44.800
It's simple enough, right? Let's look at the specs for this.
00:07:52.600
Oh, well, there's only one.
00:07:58.800
Okay, let's look at that spec.
00:08:05.000
It says that the 'in_progress' method returns false before the conference begins.
00:08:12.240
That seems like maybe the author forgot a couple of use cases. But this is something common that I see with date-based specs.
00:08:20.000
The author of the spec is living in the now; they're not thinking about the future.
00:08:27.480
Kind of like me with Daylight Saving Time.
00:08:34.680
We forget to write that spec that goes along with whatever happens after the clock changes.
00:08:40.800
So, I'll bet that this spec was passing before the conference, but now that the conference is happening, it's probably failing.
00:08:46.640
So let's play with the system clock and see if we can't figure that out.
00:08:51.400
All right, so as I predicted, it passed when I set the clock back in time to October, well before the conference.
00:08:58.000
Not at all the day that I wrote the slide.
00:09:05.600
So, we know this is a flaky test now because it was running and now it’s not.
00:09:11.600
If I set the clock forward to the beginning of the conference, it fails.
00:09:17.760
So how do we fix it? Remember, this is a non-deterministic issue.
00:09:23.600
We need to either mock or account for that non-determinism.
00:09:30.960
What I'm going to do in this case is freeze time.
00:09:35.480
Now, here's the code and the spec as they were before, and here's the difference.
00:09:41.760
This is the fixed spec, and you'll notice only the bottom half of the screen is changing.
00:09:47.440
The code is fine; there’s no problems with the code. This is only a problem in the spec.
00:09:53.200
What it’s doing is freezing time, setting the time to a specific date.
00:10:00.000
In this block, the code will only execute under that context.
00:10:07.000
As soon as the block ends, the system is sent back to its normal state.
00:10:13.920
Okay, let's see if that fixed it. All right, November 13th—the spec is now passing.
00:10:21.780
In fact, I even went back and added those other specs that were missing: the one for before and the one for during or after.
00:10:27.960
Those look like this. Notice they’re just doing the exact same thing.
00:10:34.640
They have a different date and a different outcome, but they’re basically the same test.
00:10:39.720
All right, so we fixed one of those flaky tests. Let's take a look at another one.
00:10:46.000
This one is about a network connection that goes down.
00:10:53.480
It's not uncommon for your code to need to talk to another service across a network.
00:11:01.760
In this session description method, we're calling an API endpoint to fetch the conference schedule.
00:11:08.360
Then, we're parsing it, finding the session that matches the title parameter, and returning a description.
00:11:15.000
Here's the spec. I know that's a lot of code to take in, so I'm going to take a second.
00:11:21.840
All right, so with Wi-Fi enabled, this spec passes.
00:11:28.449
Note that the call to the network might not pass here at the conference, but it does pass in this scenario.
00:11:35.560
And it adds over a second to the runtime, as you can see there.
00:11:42.400
A lot of times, HTTP calls that fail take 60 seconds to time out, which makes it harder to make these tests useful.
00:11:50.680
Plus, when I turn off Wi-Fi, the spec fails. Fortunately, it fails quickly.
00:11:58.200
The network failures are on my end, not the server end.
00:12:05.040
These are particularly nasty tests to debug because that loss in connectivity is not logged or persistent.
00:12:11.600
By the time you get around to debugging the failure, it might not be possible to reproduce without turning off your Wi-Fi.
00:12:18.160
So pay attention to HTTP calls in your code or any other type of call that crosses a network, like gRPC.
00:12:24.320
Try running the specs with Wi-Fi turned off—you'd be surprised how many things you catch that way.
00:12:31.360
Okay, let's fix this thing. Here it is: same exact code, same test as before.
00:12:36.960
A smaller font size. I'm not going to put as much code on the screen as Jeremy just did.
00:12:43.600
But here we go; that's the fix.
00:12:49.040
Now, again, the code at the top half of the screen is not changing.
00:12:55.920
The changes to the test are just setting up that stub for the network response.
00:13:01.680
This allows the spec to validate that we're parsing the results correctly.
00:13:07.920
And that’s really the code we care about; we don't actually care whether this external service is up and running.
00:13:14.160
It shouldn't matter.
00:13:20.080
On that point, I often get pushback. Folks will ask: 'What if the API changes?'
00:13:26.640
How are we going to know if the spec isn't going to fail because it's mocking the result?
00:13:31.960
My answer to that is that every spec should have one and only one reason to exist.
00:13:39.040
In this case, that reason to exist is to verify the code that we wrote works as we expect.
00:13:45.240
It's a unit test, and we want these to have as few dependencies as possible so they execute quickly.
00:13:52.160
Now, you may also require a spec to validate that an API schema hasn't changed.
00:14:01.240
That is a different reason for a spec to exist. It is not this spec.
00:14:08.760
In fact, it's not even a unit test; it's an integration test.
00:14:14.280
Because of that, it's even an integration test that's designed to fail.
00:14:20.160
We want it to fail when that schema changes.
00:14:26.760
So, it's not something we want to run alongside all of our unit tests, which are designed to pass every time.
00:14:34.000
Maybe the integration tests should be run separately on a schedule instead of among all of our unit tests.
00:14:39.760
All right, let's run the spec and see if it passes.
00:14:47.040
Sure enough, if I turn the Wi-Fi off, the spec still passes.
00:14:53.680
Now, because I'm mocking or stubbing the response to the network call.
00:14:59.560
It's also important to point out the difference in execution speed here. The live version took 1.3 seconds.
00:15:07.200
This one took one hundredth of a second, which can become incredibly important.
00:15:14.760
Especially as your test suite grows, particularly when it hits 100,000 specs, like we just did.
00:15:20.360
All right, it is 15 minutes in, and those specs took us about 10 minutes to fix.
00:15:27.760
That wasn't so bad. Let's see if we can solve some more of these before you have to head out.
00:15:34.160
All right, let's take a look at order dependence next, starting with a definition.
00:15:40.760
Order dependent specs are specs that pass in isolation but fail when run with other specs in a specific order.
00:15:46.960
For example, if both test A and test B pass when run in alphabetical order, but test A fails when run after test B, then that makes test A flaky and test B leaky.
00:15:54.960
What does that mean, leaky?
00:16:01.520
Remember, these specs are making an invalid assumption about the environment, and that environment includes all the shared state that they have access to.
00:16:09.919
It works kind of like this: let's pretend this blue square is the starting point for the shared environment.
00:16:17.440
Spec A runs and gets the blue square, and it passes. Spec A does not change the state, so Spec B runs in the same context, and it also passes.
00:16:25.040
Now let's imagine running them in the opposite order, where Spec B gets the blue square but adds or changes some state, turning it into a pink hexagon.
00:16:32.640
Now this state is what test A is going to run in. It didn't expect that, so it's going to fail.
00:16:39.760
So basically what happened is Spec B leaked some state, either deleting or adding or changing something that's in the environment and causing Spec A to fail.
00:16:46.440
Spec A was only failing because it was susceptible to that; it made that invalid assumption about its environment.
00:16:53.840
For this reason, we think of these tests as leaky tests.
00:17:01.600
And isn’t leakiness the problem here? Isn’t Spec B the problem?
00:17:08.600
Well, actually no; they are both to blame. The reason is that one is leaking while the other is susceptible to it.
00:17:16.160
Only one of them, though, is breaking your build, which means that you should focus on that one first.
00:17:23.160
Now, often you'll find that if you fix the broken spec, it will point to a broader solution that will solve the leakiness as well.
00:17:32.440
But how do you fix these dependent flaky tests?
00:17:40.160
First, let’s look at what causes this kind of failure.
00:17:46.760
Order dependent failures are caused by mutable state that is shared across specs.
00:17:54.640
This could be in the form of broadly scoped variables like a global or class variable.
00:18:02.560
It could be a database, a key-value store, a cache, or even the DOM if you're writing JavaScript specs.
00:18:09.720
So that's what causes it, but how do you reproduce these issues?
00:18:16.800
The first thing to do is eliminate non-determinism. Make sure it's not non-deterministic.
00:18:23.840
You can do that by repeatedly running the same spec in isolation.
00:18:30.560
Now, it might take a while, but eventually if it's non-determinism, it will fail.
00:18:37.600
Now, if you can't reproduce it that way, run all the specs that ran together with this spec.
00:18:44.440
Probably from a process on the build server locally and/or run them in random order.
00:18:52.880
Continue running them in random order until one of them fails.
00:19:00.960
Now, if the default order doesn't solve it, you can run RSpec with the option 'order random'.
00:19:08.080
Keep running them that way until you find a seed that consistently causes the failure.
00:19:14.080
Then the next thing you want to do is locate the leaky spec or specs.
00:19:20.360
I say specs plural, but it's possible that it takes multiple specs modifying the environment before the failure reoccurs.
00:19:26.840
Keep running it until you find that seed that causes the failure.
00:19:33.560
Once you've found it, you can use RSpec's 'bisect' feature to find the actual leaky specs.
00:19:41.560
I’ll show you how to do that in just a minute.
00:19:47.320
First, though, what do we need to do to fix these problems once we've found them?
00:19:54.680
You can remove the shared state. You can make it immutable or you can isolate it.
00:20:01.920
Don't use those broadly scoped variables—those are a recipe for flakiness.
00:20:08.840
Mock shared data stores with a layer of abstraction.
00:20:15.760
You can use database transactions, which, if you've used Database Cleaner, that’s what you're doing.
00:20:22.600
Or you can reset the shared state in between specs.
00:20:30.320
All right, let's see if we can solve another one of those flaky specs that’s keeping you here.
00:20:37.640
Let's look at a shared state example. Here we have a simple getter and setter to store a favorite session.
00:20:44.000
Notice that both the getter and setter leverage an object called 'cache'.
00:20:51.600
They call a method on it called 'instance'. What is that? Let's take a look.
00:20:58.720
So, the cache is a simple in-memory key-value pair backed by a hash.
00:21:05.680
The instance method is effectively turning it into a singleton.
00:21:11.680
So that every reference to the cache's instance method is returning the same instance.
00:21:18.240
Here are the specs for the favorite session getter and setter.
00:21:27.520
Now these specs pass when run in this order: getter first, then setter.
00:21:34.440
But they fail in the opposite order because we’re storing the title of the session in the cache.
00:21:41.180
If we run the setter first, we're going to set the cache, and then the getter will return the value in the cache.
00:21:47.760
So it will not see the default value, whereas the spec for the getter is looking for that default value.
00:21:55.040
To prove that, let's run this with RSpec in random order mode and see if we can’t reproduce the failure.
00:22:01.920
Here you can see I ran the spec with order random, and RSpec produced a random seed.
00:22:08.960
The getter ran before the setter with that seed, so it passed.
00:22:15.040
Let's try again.
00:22:21.160
Okay, so here I'm still running with order random, but RSpec chose a different seed.
00:22:29.439
And lo and behold, the setter ran first, so the getter failed.
00:22:35.920
So how do we go about fixing it now that we know?
00:22:41.920
Well, we know only that one of these three specs caused the failure.
00:22:48.920
They all ran before the getter spec, so any of them could have been the problem.
00:22:55.320
Now, we have a suspect; we think it’s the setter. But how do we know for sure?
00:23:01.920
This is where RSpec's bisect comes in, so let's take a look.
00:23:08.640
Here, I'm running RSpec bisect with the same order random clause and the same seed that produced the failure.
00:23:14.600
This is important because bisect won't work if you hand it a command that won't fail.
00:23:20.320
So I had to find the seed that caused the failure first.
00:23:27.040
The first thing bisect does is it tries to reproduce the failure, because if it can't, it'll just exit and say, 'I can't help you.'
00:23:34.120
Next, it analyzes whether that failure appears to be order-dependent, and in this case, it does.
00:23:40.320
You can see the failure appears to be order-dependent.
00:23:46.480
So it performs a binary search looking for the spec or specs that need to be run first in order for the failure to happen.
00:23:52.080
Now, note that this can take a very long time if that list of candidate specs is long.
00:23:57.920
Finally, if it finds a reproducible order in which the test fails,
00:24:02.480
it will give you the command to run to get that failure to reproduce and cause it to just be the necessary specs.
00:24:08.960
So if you run that command like this, you can see here I'm running exactly the command proposed.
00:24:15.120
Including 'order random', I added 'format documentation' here so we’d see the names of the tests.
00:24:21.680
And sure enough, the setter is the suspect.
00:24:28.640
We now know that’s the test we need to go address.
00:24:35.240
So how do we fix it?
00:24:40.480
Well, here we are back at the beginning again.
00:24:46.560
One way we could approach this would be to add a 'clear' method to the cache and call that method in between specs.
00:24:52.920
But because our specs are currently sharing state, if we ran them on the build server and cleared the cache in between specs,
00:25:01.280
running them in parallel, what would happen? One test would interfere with the execution of another.
00:25:07.920
So what I prefer in this situation is to use a technique called dependency injection.
00:25:14.720
Now, it's a simple technique with a weird sounding name.
00:25:21.760
All it does is we’re going to pass the cache object into the RubyConf object when we create it.
00:25:27.680
So each spec will be able to pass its own cache into the RubyConf object.
00:25:34.480
Here's what that looks like in code, starting with the implementation.
00:25:40.960
Notice that the cache parameter defaults to cache.instance.
00:25:47.120
This way, if we don't pass anything when we create the RubyConf object, it'll just use the singleton.
00:25:53.200
That's exactly what we want in production.
00:25:59.520
Now by doing this, though, we've created a seat in the software that allows the specs to use their own cache objects.
00:26:06.080
This prevents state from leaking between the specs without modifying the behavior of the production code.
00:26:13.920
All right, let's look at the specs. Here we are creating a new instance of the cache class and passing it to the RubyConf object.
00:26:21.600
That's it! That's all there is to dependency injection.
00:26:28.160
You create a parameter in the initializer and hand the object to the collaborator.
00:26:34.320
All right, let's run the specs again.
00:26:41.440
I'm running the specs with the same randomized seed that caused them to fail in the first place, and now they pass.
00:26:47.280
Voila! We’re making good progress—it’s only 4:30!
00:26:52.120
You might actually make that thing after all.
00:26:58.960
Let’s take a look at a couple more broken specs, but first, let's talk about race conditions.
00:27:05.360
So what is a race condition? A race condition can occur when two or more parallel processes are competing over the same scarce shared resource.
00:27:11.880
First, let’s look at two specs running in sequence to see how they work when writing to a file.
00:27:18.560
The first spec is going to write to the file, read from the file, check the result, and it’ll pass.
00:27:25.080
Now the second spec is going to write to the file two times, read from the file, and pass.
00:27:31.040
But if I run them in parallel, the first spec writes to the file, then the second spec writes to the file.
00:27:37.600
Then spec one reads, checks the result, and fails, because there are two records when it only expected one.
00:27:43.920
Next, the second spec writes again, reads, and fails, because there are now three records when it expected two.
00:27:50.080
In this case, both specs are susceptible to parallel flakiness due to a race condition.
00:27:56.880
Of course, this is synchronous code, so who knows—it might actually pass.
00:28:03.320
That’s why race conditions are notoriously hard to reproduce.
00:28:09.520
So how do we go about debugging them if we can't reproduce them?
00:28:16.080
You want to take a methodical approach like we did with the order-dependent specs.
00:28:24.960
The first thing is to try and eliminate non-determinism.
00:28:32.320
Run that failing spec in isolation repeatedly.
00:28:39.520
If it fails, then you know it’s non-determinism, and if not, try to eliminate order dependence.
00:28:46.320
Run it repeatedly with the specs that it failed with on the build server in different orders.
00:28:53.520
Now, if you can reproduce the failure that way, that’s an order-dependent problem.
00:29:01.680
You can use bisect to help you there.
00:29:08.720
Finally, if that doesn't work, run the specs repeatedly in parallel using the parallel RSpec gem.
00:29:14.800
I mention this gem specifically because other gems that help you with this seem targeted.
00:29:20.880
Like parallel tests or knapsack seem targeted at running Rails apps on their CI server.
00:29:27.440
It's best if you can debug this stuff locally, and parallel RSpec will let you do that.
00:29:34.400
If you still can't reproduce it, try adding 'order random' to the parallel RSpec.
00:29:41.760
That will give you the same parallel system, but it'll randomize the order the specs are in.
00:29:48.400
So once you’ve reproduced it, or even if you can't, what are you going to look for to try and fix these things?
00:29:56.560
The main cause, remember, is that race conditions are caused on the build server, usually by asynchronous code competing for shared resources.
00:30:03.760
Those things might include the file we just looked at, it could be a socket, a thread pool, or a connection pool.
00:30:10.960
Or even memory!
00:30:18.120
Once you have a suspect—once you know one of the things that it could be—how do you fix it?
00:30:24.840
Well, there are a lot of different ways; it depends on the situation.
00:30:31.520
If it's an IO-based issue, you can use a class called StringIO as a stand-in for the other type of IO you're trying to perform.
00:30:38.480
If it's a thread issue, you can try writing thread-safe code.
00:30:45.360
You can test the messages are passed correctly and not test the results coming out of the object.
00:30:52.400
Then if it's threads, you can write thread-safe code.
00:30:59.760
Of course, you could also extract that code out of the threaded class into a pool that you call from the thread.
00:31:06.800
Then test the pool synchronously, and trust that Ruby knows how to instantiate a thread and run code on a thread.
00:31:13.040
You can also switch to fibers instead of using threads.
00:31:20.720
Fibers are super cool! They let you test synchronously; you don't need to test on a thread.
00:31:26.600
They also make it incredibly hard to create a race condition.
00:31:32.640
Because one of the things they do is take control of the CPU and don’t hand it back until they’re done.
00:31:39.440
So any kind of atomic operation you need to perform that involves several commands can be done without having to account for thread safety.
00:31:46.240
Let’s take a look at how to fix one of these. Here are the last two flaky specs that are actually keeping you here.
00:31:52.960
This is a feature of the app that manages a list of reservations.
00:31:59.440
Now, there are two methods—one to reserve a seat, the other to get a list of attendees.
00:32:06.160
You can see that reserving a seat just writes to a file.
00:32:12.960
Getting the attendees back is just reading from that file.
00:32:18.560
So let's take a look at the specs.
00:32:25.760
The first spec ensures that writing Mickey Mouse to the file grows the number of attendees by one.
00:32:33.120
The second spec ensures that when writing multiple lines, Donald and Goofy, the attendee count goes up accordingly.
00:32:40.160
So, when I run them in sequence, they pass.
00:32:47.680
In fact, they'll even run in sequence if I switch the order.
00:32:53.840
But if I run them with Parallel RSpec, they will fail. In fact, both of them fail.
00:33:01.440
The second spec actually failed first because of parallelism.
00:33:08.640
The attendee count actually grew by three, not two, like I showed a minute ago.
00:33:15.520
And the first spec actually finished second, and it failed because the record count grew by two, not one.
00:33:22.080
We've already seen how that can happen, but let's walk through it again a little slower.
00:33:29.040
Let's look at how this RSpec code works—it's not as simple as most RSpec tests.
00:33:36.920
Here's the expectation for the second test; it has two blocks of code.
00:33:43.840
The first is passed into an expect method, the second is passed into the change method.
00:33:49.200
Now, when our specs execute, they're going to run this block of code first.
00:33:56.160
That is, it's going to check the attendee count before it runs the actual test.
00:34:01.360
Next, it's going to run the expectation block, and then it's going to run this block of code again to get the final count of records.
00:34:08.000
And then it'll check the delta between the initial and the final and compare it to what was passed into 'by'—in this case, two.
00:34:15.560
Now that we know how RSpec is going to execute this, let's watch how this spec actually processes.
00:34:23.440
First, the second spec reads the file and finds that there are zero records.
00:34:29.120
Next, the first write in the second spec executes, so now Donald is in the file.
00:34:36.960
Next, the first spec checks its initial value.
00:34:42.560
It finds a value of one record in the file, and both writes happen.
00:34:48.720
It doesn't matter what order they happen in; we could check the file if we really cared.
00:34:55.040
But they both happen, and then both reads happen.
00:35:02.080
We happen to know here that the second one happens first because that's the test that was in the output first.
00:35:07.040
But it's going to look at the value that was in the file—how many records.
00:35:14.360
Both of these tests are going to see three records.
00:35:20.920
Now the delta between zero and three is three, and the delta between one and three is two.
00:35:27.040
Neither of those match the 'by' clause, so both tests fail.
00:35:32.760
Now we know how it failed; let's go back to the beginning and show you the solution.
00:35:40.480
It turns out the Ruby Core team actually thought of this, bless their hearts.
00:35:46.560
They knew that testing asynchronous IO would be really hard, so they included a class called StringIO.
00:35:54.000
StringIO simulates other kinds of IO in specs. It’s a string but with the interface of a file.
00:36:01.160
What we want to do is allow 'File' to receive 'open' and yield a StringIO object.
00:36:08.480
What that does is mean that now when I call 'File.open', the actual object it will receive is a StringIO object.
00:36:15.840
One caveat with StringIO is that you need to rewind it before you can read it.
00:36:22.480
The reason for that is it's a file, essentially.
00:36:29.760
The reason we didn't have to do that previously was because 'File.open' was a block.
00:36:36.320
As soon as that block ended, the file fell out of scope and was closed automatically.
00:36:43.000
Now that we are defining this StringIO class in the test and using it in the code,
00:36:50.240
it doesn’t fall out of scope until the test ends, so we need to rewind it before we can read.
00:36:57.760
All right, proof is in the pudding. Let’s see if we fixed this race condition.
00:37:02.600
All right, we got it! Parallel RSpec proves they're both passing.
00:37:08.760
Okay, here we are 37 minutes into this talk and we’ve resolved all of the flakiness.
00:37:15.760
So it’s time to wrap things up real quick so we can get to that thing in the lunchroom.
00:37:24.160
Because I don't know about you, but this test always makes me hungry!
00:37:30.720
Here’s a cheat sheet for the entire talk.
00:37:37.760
First, non-deterministic flakiness reproduces in isolation.
00:37:43.160
Look for interactions with non-deterministic elements of the environment.
00:37:50.760
To fix this kind of flakiness, mock the non-determinism to make it deterministic.
00:37:55.840
Don't forget about TimeCop when working with date and time-related specs.
00:38:03.680
There are tools like WebMock and VCR for handling specs that require network connections.
00:38:09.040
But I prefer to use RSpec, like I did earlier, so there are plenty of folks who find these tools useful.
00:38:15.120
Next, order dependent flakiness only reproduces with other specs when run in a certain order.
00:38:21.760
Because of that, they will not reproduce in isolation.
00:38:30.560
Look for state that is shared across tests.
00:38:36.480
To fix order dependency, remove the shared state.
00:38:43.920
Make it immutable or isolated. RSpec's 'order random' can help you here by reproducing the failures.
00:38:50.400
And RSpec's bisect can help you locate the leaky spec that's causing the failure.
00:38:56.600
Finally, race conditions only reproduce with other specs when run in parallel, not in isolation.
00:39:03.760
Look for asynchronous code or exhaustible shared resources.
00:39:10.640
To fix race conditions, isolate things from one another.
00:39:17.200
Like we did with StringIO, or use fibers instead of threads.
00:39:23.680
Seriously, these things are amazing; you’ve got to try them.
00:39:29.280
Finally, you can use parallel RSpec to reproduce the failures locally instead of on your build server.
00:39:37.200
Now, keep in mind that original point: all of these specs have a common problem.
00:39:43.760
They're making an invalid assumption about the environment in which they're running.
00:39:50.760
Sometimes just remembering that fact will help you identify and resolve the flakiness.
00:39:57.760
Ask yourself: how can I ensure that this spec has the environment that it expects?
00:40:05.040
And one more thing—I have a bit of a hot take here.
00:40:12.080
Debugging this stuff is incredibly hard, but it gets a thousand times harder if your specs are too DRY.
00:40:18.080
So avoid using these features of RSpec. They seem harmless at first, even useful.
00:40:25.840
But ultimately they're going to make debugging way too hard.
00:40:34.880
So avoid shared specs, avoid shared contexts, avoid nested contexts.
00:40:39.920
Your specs should be incredibly communicative.
00:40:47.160
After all, they are the executable documentation for your code.
00:40:54.080
If you have to scroll all over the place or open a ton of files to debug it later, that's a problem.
00:41:01.840
Keep your tests wet!
00:41:08.080
I'm not the only one who says this; the fine folks at Thoughtbot agree.
00:41:13.520
They've written several articles on this, and honestly, DRY might be the worst programming advice ever.
00:41:19.280
You can read more about it in my articles.
00:41:26.000
I told you it was a hot take, but it sounds like I have some agreement here. That's awesome!
00:41:31.960
If you still disagree, come find me at lunch. I want to change your mind!
00:41:38.600
Again, my name is Alan Ridlehoover. I do know a thing or two about flakiness.
00:41:46.440
It took me over 20 years to get here. I hope this talk has short circuited that for some of you.
00:41:54.440
I work for Cisco Meraki, so I also know a thing or two about connectivity.
00:42:02.960
Here’s how to connect with me. That last item there is the source code for this talk, including the fixed code.
00:42:11.200
It's tagged so you can walk through it one thing at a time.
00:42:18.960
You can look at the failure, you can look at the success, or you can practice fixing the failures.
00:42:26.040
Cisco Meraki is probably the largest Rails shop you've never heard of, and we are growing.
00:42:34.200
So if you're interested, there are limited roles open right now. Come chat with us at the job fair.
00:42:42.280
Find out what it's like to work at Meraki.
00:42:50.360
Finally, a tiny bit of shameless self-promotion. I love Ruby so much.
00:42:56.520
My friend Fto and I often write code on the weekends just to release something into the wild in the hopes that somebody finds it useful.
00:43:10.680
You can find links to our stuff at firsttr.software, including Ruist.
00:43:14.920
Ruist is an opinionated VS code theme. It’s the one you saw in the talk.