wroc_love.rb 2023

Testing Randomness

wroc_love.rb 2023

00:00:04.259 Thank you. So from what I've heard, I'm replacing a guy who's supposed to be tall and good-looking.
00:00:10.920 Jokes on you, huh? All right, I think we can start.
00:00:18.539 Today, I'm going to tell you a story. I know some of you might be tired already.
00:00:23.760 So you can treat this as a bedtime story. This story will be about an adventure.
00:00:31.080 As in any good adventure, there should be a hero, and our story will have a hero as well.
00:00:38.880 Our hero will face a lot of challenges to answer one question: how to test something that's random.
00:00:46.620 So let's keep our fingers crossed for our hero. Hopefully, he will learn something.
00:00:52.739 And hopefully, some of you will learn something as well—or at least have some fun. Let's start.
00:00:59.100 Meet Fredo. Fredo might look familiar to some of you because, just like most of you, he is a developer.
00:01:07.080 So it's natural that you can see some parts of yourself in Fredo. He will go on an adventure.
00:01:13.080 Now, what's the adventure about? Fredo has to write a game.
00:01:23.220 A game that takes two players. Each player rolls a die, and if the first player rolls a better score, he wins. Otherwise, the second player wins.
00:01:29.460 Sounds simple enough, right? Now is the time when we can try to look through Fredo's eyes and implement this ourselves.
00:01:35.520 We started with a Game class and with tests. Where do we start? Of course, with tests.
00:01:41.960 Let's get started. Oh, by the way, those are in our specs, so you probably all are familiar with it.
00:01:49.560 Our game should return a winner, right? So we want to have a game, which we will call Game.
00:01:55.440 It should take two players. Let's call them P1 and maybe Miri.
00:02:03.379 Right. We want to have a play method. Now the fun part begins because we expect this game to have a winner.
00:02:10.520 But it shouldn't be just typing P1 or Miri all the time. This thing puzzles Fredo further.
00:02:18.840 He's very pragmatic and doesn't let these tiny details stop him. He writes the code according to the business requirements.
00:02:24.540 If game P1's score is greater than game P2's score, P1 wins. Otherwise, it's Miri.
00:02:31.200 Sounds simple enough, looks simple enough. Let's see if our tests tell us something.
00:02:37.739 The main thing you have to look for here is: are the tests red or are they green? This time, it's red, which means it's good.
00:02:43.620 We also wrote tests that are not covered. There's nothing here; they fail. It's perfect!
00:02:50.640 Let's try to write some implementation now. Initialize, we take P1 and P2, of course.
00:02:56.880 We want them to have a play method.
00:03:04.860 This should involve a result of a dice roll, so let's say one to six.
00:03:10.620 We've all seen dice before. Six sides from one to six. Our main logic here is this:
00:03:18.540 If P1's score is greater, P1 wins; else, it’s P2.
00:03:24.540 All right, and let's see if our tests will work. Only one more thing: attribute readers.
00:03:30.180 All right, let’s run some tests now.
00:03:36.300 Look at that! Everything is green. Everything is perfect. Fredo is very happy with his code.
00:03:42.000 Not only did we solve the problem; we wrote tests first. This means it has to be TDD.
00:03:49.560 Only one thing left: we have to let the world know, so let’s go on LinkedIn.
00:03:55.440 TDD—it's there! The company earns a ton of money, and we want some improvements.
00:04:02.500 Right now, we have validated our idea; it’s great, fantastic! We need some more.
00:04:10.920 So we have a new quest here: let’s add a draw. Right now, well, it’s only one under the other.
00:04:21.539 But sometimes both players can roll the same number. It’s supposed to be a tie.
00:04:29.520 That doesn’t sound that bad, right?
00:04:36.600 So let’s go back to the code and see what we can do here.
00:04:43.260 First of all, where do we start? With tests, right? We already have those beautiful tests here.
00:04:52.380 So let’s try to modify them. We have our else branch; let’s change it to else if P1 score equals P2 score.
00:04:59.640 Perfect. And one more thing: else it’s supposed to be a tie. That’s fine. Now let’s run some tests.
00:05:10.260 The first thing that was unexpected happens: we just added a new requirement to our tests.
00:05:17.400 I think at least Fredo thinks he did, and the tests still pass. How is this possible?
00:05:26.480 Fredo has a group of friends, very close friends—one might even say a fellowship.
00:05:33.180 But he asked for their help. I don’t have Fredo’s friends here, but I have you guys.
00:05:40.780 So let’s start. What’s wrong with this test? It’s obvious.
00:05:47.320 Yes, they are random. Yeah, that’s true. But in this specific scenario, what do you think could be improved? Why did the test pass?
00:05:55.680 Come on, I know you know it’s not that late. Of course, because we have those if statements here, the result is that only one of those test branches is run at a time.
00:06:07.259 This is not what we’re looking for because we either test if Pepin was a winner or Miri, but we never test all of those branches at once.
00:06:14.600 All right, now what do we do? Let’s go back to the state with the code that we thought was working.
00:06:22.440 Let’s remove this additional requirement here.
00:06:28.800 Okay, so what can we do to actually make those tests better? Any ideas? We want to get rid of if-else statements.
00:06:35.639 So what can we do? Come on. Monkey patched around? How would we achieve that?
00:06:42.480 Yeah, down the rabbit hole? No, no. Okay, any simpler ideas?
00:06:48.600 Oh yeah, I can hear something. I think Fredo would love that. That’s true—let’s begin with stubs or mocks.
00:06:55.620 Fredo, as an experienced developer, is quite familiar with that.
00:07:02.480 So we go here, our spec, stub branch.
00:07:09.120 Stop front. All right, it gives us hope because it looks like someone else was looking for that as well.
00:07:16.740 There’s a question that looks like maybe it might be about the same thing.
00:07:23.640 There’s like an answer, but it’s a bit too long. So whatever.
00:07:30.780 Oh, look at this one-liner! That’s nice. Six subvotes—that’s even nicer.
00:07:38.040 Okay, copy-paste. Like it has to be good, right?
00:07:44.460 All right, let’s type it here. I will show you the full line.
00:07:51.960 So it’s allowed to receive random and return one, two, three, four, five.
00:07:57.480 We actually want to update it a bit, right? Three, then one.
00:08:04.320 This means that the first time we call random, it returns three; the second time, it returns one.
00:08:10.800 And this means that we should be able to get rid of those.
00:08:16.560 But to make sure that we didn’t break anything, let’s run tests. They should pass.
00:08:23.960 Perfect! Now we have a test that always tests the first branch here in line 13.
00:08:30.600 The only thing left is to write another one that returns Miri.
00:08:36.300 Yeah, thanks Copilot. Let’s run it.
00:08:43.260 Now we have two examples. Both of them pass and look much more reliable than what we used to have.
00:08:51.600 Okay, now time for the new requirement.
00:08:57.600 It returns a tie.
00:09:04.320 Okay, in case we have three and three, we want a tie. Let’s run some tests.
00:09:10.320 They are red, as we expected.
00:09:16.920 So the only thing left is to update the code itself, right?
00:09:24.480 Okay, let’s run some tests.
00:09:30.480 Outstanding! Fredo is very happy.
00:09:37.680 He has become like a company hero right now.
00:09:44.640 Of course, something we have to include in our city.
00:09:50.940 Now we're starting to see other companies pursuing our success.
00:09:57.480 There are a lot of companies that want to make the same amount of money that we do.
00:10:05.760 But fear not! Our business is prepared for that.
00:10:11.760 We know how to be one step ahead of our competitors.
00:10:18.240 You might say this is a revolutionary idea.
00:10:23.520 We’re going to use a new die, but we don’t call it new; we call it the better die.
00:10:30.360 Previously, the old boring die only had scores from one to six. Our new one will have one, one, three, four, five, and six.
00:10:40.200 Well, it sounds simple, although Fredo is not convinced. It’s like a revolution.
00:10:47.280 But the ticket is already in Gerard, so why not?
00:10:54.480 Back to code then. Fredo takes a quick look at what he already has here.
00:11:00.720 He notices one thing: we actually don’t change any of the logic.
00:11:06.120 It stays the same—basically, we shouldn’t change tests as well.
00:11:11.640 The only thing that’s left is to change the code that represents the dice roll.
00:11:19.440 This is what was changed, so maybe we could do something like one, one, three, four, five, and six.
00:11:25.920 And dot sample. It’s simple enough.
00:11:32.160 This is what we want. Note two here; it’s only one.
00:11:40.800 We should be good here. So you probably already know what will happen when I run these tests.
00:11:48.480 Yeah, Fredo is about to find out. Red everywhere!
00:11:55.080 Fredo’s face is getting red as well because we were supposed to do TDD.
00:12:01.680 Sorry, it was all supposed to be beautiful, easy, and fun.
00:12:08.640 Whenever we get a new requirement, it goes down.
00:12:14.640 Tests are always red! It would be best to remove them, but we probably wouldn’t get approval on GitHub.
00:12:21.720 So, we have to do something about them.
00:12:29.520 First thing’s first: let’s get back to the version that we know was working—the one with front.
00:12:36.480 So, what can we do now? Fredo asks his friends about this, and they tell him one thing: SOLID.
00:12:43.440 Yeah, Fredo knows about SOLID a thing or two, especially about the O.
00:12:48.480 Because O was his favorite letter—the most round one. He remembers what it stands for: Open/Closed Principle.
00:12:56.640 The idea is that maybe this could somehow help Fredo as well.
00:13:02.880 The other thing is he notices that although the game appears to require rolling a die, we don't actually have a die anywhere in the code.
00:13:09.540 The lines from 9 and 10 should represent this better. Therefore, let's introduce an abstraction we can call a die.
00:13:16.800 Let’s do that.
00:13:22.560 That’s called die. It should return the same range from one to six.
00:13:28.260 Okay, let’s try to include this in our class: die.new. Of course, attribute reader.
00:13:34.680 And what we need to have here is also a variable.
00:13:41.520 All right, before we do anything else, let’s make sure we didn’t break anything. Tests are still green.
00:13:48.480 Let’s try now to include this abstraction here.
00:13:55.920 Awesome! Tests are still green, which means we actually did a proper refactor here.
00:14:02.040 Nothing has changed, aside from the implementation. The behavior stayed the same; that’s good.
00:14:08.940 However, how did this help us? We still have a die that rolls randomly.
00:14:15.480 However, if we could somehow create a machine-dice that rolls in a very specific way, we could use it in our tests.
00:14:22.320 Because if we know it always rolls one, we could test for a tie.
00:14:29.280 So let’s see if Fredo is able to somehow machine this die here.
00:14:35.520 We can call it fake die. Now we want to be able to define those rolls on the go.
00:14:46.560 Of course, it has to have a method roll.
00:14:53.520 I think it’s called die typing, right? If it looks like a die and rolls like a die, it’s a die.
00:15:00.840 Okay, let’s see if we can somehow include this fake die in our tests.
00:15:06.960 Maybe we can create a die. It’s a fake die. Okay, let’s include it here.
00:15:13.680 Let’s run our tests. Nothing has changed, which is good.
00:15:19.800 Now, if our assumption is correct, we should be able to remove this line.
00:15:26.160 Let’s see what happens now. It still works! Great!
00:15:32.640 Now let’s just do this for the rest of the examples: one, three, and here we want one and one.
00:15:39.120 Let’s remove those lines; we don’t need stubs anymore.
00:15:45.600 I had to, of course, forget about passing the dice.
00:15:52.680 Okay.
00:15:59.520 Perfect! We’re still in the very beginning, meaning everything works kind of as it used to.
00:16:06.600 But there is still one more important thing here.
00:16:13.920 Our class that we were testing no longer cares about the die and the type of die.
00:16:20.880 It only cares that it rolls. So to achieve our goal, we have to do one more thing.
00:16:26.520 Remember how we called it? It’s better die. Nothing nice.
00:16:32.700 Here we implement sample.
00:16:39.240 All right, now we replace this old boring die with better die.
00:16:46.860 Let’s run tests. Tests are still green!
00:16:50.520 Because we don’t really care about the die anymore; we only care that it works.
00:16:55.320 We test what we have here. Perfect.
00:17:01.680 You probably know what this is called. If you don't, you can follow Fredo on LinkedIn.
00:17:06.960 It's called dependency injection.
00:17:13.920 All right, the company is very happy!
00:17:20.640 Yes, there’s the question: are the tests using the fake die?
00:17:30.480 Yes, that’s right.
00:17:36.960 In the real world, I would actually suggest testing end-to-end, or at least not unique tests with fake die only.
00:17:42.120 But it’s not a real world; it’s a story.
00:17:50.280 That’s a good point, and we actually should include those in tests.
00:17:55.760 Only if we don’t want to test this with unit tests, you can go ahead and use a fake die.
00:18:01.680 Okay, we did it! Our company is happy—a very rich company now.
00:18:09.240 We’re also very rich because look at our resume.
00:18:15.840 There’s one more problem here—one last tiny thing.
00:18:22.560 Our code looks like it works perfectly, but we still get a call from this one customer.
00:18:30.120 He tells us that he played with his friend April 5 Heroes, won, and I still lose.
00:18:37.040 It shouldn’t supposed to work like that.
00:18:42.480 Yeah, it shouldn’t, and Fredo is a bit angry because the tests show that everything works.
00:18:47.520 Tests are green. It has to work.
00:18:53.880 So he was about to grab a phone and kindly explain to the customer why it works.
00:19:01.200 But on the other hand, I think he briefly remembers the project.
00:19:07.920 Maybe you also remember such projects where tests were green and yet features didn’t work.
00:19:14.520 Have you and anyone? Yeah, less than half that I hoped for.
00:19:20.520 All right, tests are green but can we actually prove that our code works?
00:19:29.760 Fredo talks with his group and has an idea.
00:19:35.880 Right now, when the game ends, we know who wins.
00:19:41.760 But it would be really nice if we could tell why that person won.
00:19:47.520 Because look at this: if player one wins, it means he could have rolled a six.
00:19:54.480 His opponent could have rolled a one, two, or three. They got it.
00:20:01.920 There are many events that lead to the same result.
00:20:09.120 It would be really nice if we could somehow track this.
00:20:15.360 So he gets an idea: it would be really nice if we had a game that actually returns some events.
00:20:22.560 Some logs, some audits. So let’s say our fake die rolls a two and a one.
00:20:30.720 We want to have an event that says that P1 rolled a two, maybe rolled a one.
00:20:36.480 And that the winner is P1.
00:20:43.260 Because then we would exactly know what happened and why P1 won.
00:20:49.560 Not only that, he won.
00:20:55.560 All right, so to do that, we add events here.
00:21:02.280 Let’s initialize it. This is important: after each one of the state changes.
00:21:09.960 Every time we use the instance variable or assign a new value to it, we want to log it.
00:21:18.840 So player one rolls something, player two rolls something, and finally who the winner is.
00:21:25.920 Right now, you can actually take a look at those events.
00:21:33.600 Maybe you can store them in a database, maybe you can log them somewhere else.
00:21:39.720 Like Sumo Logic or Datadog, or whatever you use, and you can actually follow the full execution of your code.
00:21:48.600 See what was happening and assert the code actually behaves as we expected in tests.
00:21:54.120 So one final thing: tests are still green! Everyone is clapping.
00:22:01.920 Fredo learned a new skill.
00:22:07.560 With such an impressive resume, he becomes a president. His journey ends here.
00:22:15.600 However, ours does not—not yet, at least.
00:22:21.000 I'd like to go through some of those tests with you and talk a bit about them.
00:22:26.760 Let’s start with the ones with conditionals.
00:22:33.600 Those are very obvious examples of how not to write tests.
00:22:39.480 We have logic in tests—that's the main problem.
00:22:46.920 What’s bad about it? To be fair, I don’t think there is anything good about them.
00:22:53.760 Logic in tests always makes them harder to understand.
00:23:01.320 Logic in tests that mimics the logic in code actually means that you’re probably not testing anything.
00:23:07.320 It gives you a false sense of security, so don’t use it.
00:23:15.600 Another question: who has ever seen logic in tests?
00:23:21.480 Hands up—no one? Okay.
00:23:27.780 So there are different flavors of logic.
00:23:34.320 Have you ever seen a loop in tests?
00:23:41.280 Yeah, that’s logic, because why is it bad?
00:23:47.480 Not every logic is bad, but when you have a loop like that, you test 100 products.
00:23:55.560 If the test fails, it returns 101 instead of one. For which product did it happen? It’s hard to tell.
00:24:03.240 It should be understandable immediately.
00:24:10.560 Have you ever seen something like that? I've seen that.
00:24:17.520 We have this nice serializer that turns cents to dollars.
00:24:24.960 Our code already uses it, so let's use it in our tests.
00:24:31.920 Well, if cents to dollars breaks and starts returning nil, our code will still pass.
00:24:38.400 But that means our feature won’t work.
00:24:44.960 Okay, the second thing is the one with stubbing.
00:24:52.560 Now we’re getting to the more controversial aspects.
00:24:59.760 Why it’s good and why it’s bad. The good thing is that we’re actually testing things.
00:25:06.480 And we’re testing all of them. Another good thing is that those tests are deterministic.
00:25:13.920 No matter how many times we run them, we expect the same thing to happen, which is good.
00:25:20.520 They are easy to write, so let’s acknowledge that.
00:25:26.640 What’s bad about them? They make your code coupled to implementation.
00:25:34.600 This can complicate refactoring, as we’ve seen in this example.
00:25:41.040 Okay, so hands up: who uses stubs or mocks in code?
00:25:48.720 Yeah! Have you ever seen something like that?
00:25:55.680 Like where you’ve changed the variable name, but the amount of those attribute values is accurate?
00:26:03.480 Good luck if you want to refactor it anytime soon.
00:26:09.720 Also, you know, results is a double, and it allows results to return something else.
00:26:15.480 This returns another double. It’s hard to understand.
00:26:22.320 And tests are supposed to be easy.
00:26:29.040 Client or any other action.
00:26:34.320 If you have documentation for the external API, injecting it into your class.
00:26:43.200 You will have the fake dice available. This kind of expects to receive that role would be useful.
00:26:48.840 This is the only thing that is called a different right way.
00:26:56.160 I fully agree it can be useful.
00:27:02.880 It just comes with a price, and as long as you’re aware of that, you’re good to go.
00:27:09.840 I’m not telling you to avoid using any of those things.
00:27:15.120 Just honestly, it comes with a price.
00:27:21.600 There are functions or methods that we don’t really care about what they return, only about side effects, like an API call.
00:27:30.960 We have to test it somehow, and I think this is a good way to do that.
00:27:39.120 Perhaps there are more benefits than the hassle of refactoring later.
00:27:46.760 Maybe we won’t refactor this—which is even better.
00:27:51.720 So yeah, if it works for you, work with it.
00:27:57.600 But if you start using stubs everywhere, ask yourself if it’s worth it.
00:28:05.760 If you see receive message chains, this should light up some heavy orange lights in your head.
00:28:13.680 Okay, let’s go to dependency injection.
00:28:19.680 Anyone uses something similar in tests?
00:28:27.120 All right, how does it work for you? Is it fun? Is it easy to implement?
00:28:34.320 You use a different class specifically for tests?
00:28:41.760 Miller is actually maybe talking about this tomorrow, and it’s called Substitute.
00:28:47.520 I don't know if you've arrived from there as well. I have problems convincing others to accept non-production code.
00:28:55.080 They seem to always say: hey, it’s not production code.
00:29:02.520 So what’s your reaction to testing? This is the product.
00:29:08.760 Yeah, I can feel that it’s hard.
00:29:14.840 If you instead of making a new class you use an instance double, then it suddenly becomes really nice.
00:29:21.600 Okay, let’s talk about what’s good in my opinion about injecting tests.
00:29:29.520 Like independence injection tests lets the code be open, which is actually a big thing.
00:29:37.680 When your code changes a lot, those tests are still deterministic.
00:29:44.640 We gain some decoupling from implementation, which is actually very good because it makes refactoring much easier.
00:29:51.360 What’s bad about it? Well, first of all, as you mentioned, some people don’t want to do it.
00:29:59.760 But fake die needs to be added like it’s a code.
00:30:08.160 I mentioned that when we add code or logic to tests, it makes them less readable.
00:30:14.520 The second thing is we need to make sure that fake dice and dice stay aligned.
00:30:21.180 If methods on a die change, we should change them on fake dice.
00:30:27.300 It’s tricky.
00:30:34.020 Lastly, if we pick the wrong abstraction, there’s a big chance that we will regret it.
00:30:40.920 It will make the code more complicated than it needs to be.
00:30:47.520 There’s one thing here I want to point out because I can see a lot of instance double uses.
00:30:54.240 It guards us from using methods that don’t exist.
00:31:01.440 But it doesn’t help us in returning values that are impossible.
00:31:08.520 For instance, if our calculate method returns an integer and in our instance double we return a string.
00:31:15.720 If we do something like that in our tests, our tests will test something different than our code.
00:31:22.980 So keep this in mind.
00:31:29.520 All right, so what about the tests with monitoring?
00:31:35.760 What about logging, like whatever you call it—auditing?
00:31:41.640 It’s actually a very good idea. I highly recommend it, especially in important parts of your system.
00:31:48.180 It makes debugging a lot easier.
00:31:55.200 It gives you an audit of changes.
00:32:02.520 At the very last thing, it gives you the ability to manually test it.
00:32:08.520 If you have a complicated algorithm and you know all the inputs, you can use your calculator on your phone.
00:32:15.840 You can test if the value is matching the one that was calculated from your code.
00:32:23.640 Of course, if you use the technique that was already mentioned yesterday and a few times today called event sourcing.
00:32:31.320 You get this for free because you already have those events that alter the state of an object.
00:32:39.720 What’s tricky about logging is if you overuse it in your domain code.
00:32:47.760 Or in the code that cares about business logic, it becomes hard to read because you log everything.
00:32:54.720 Do something, log, do something, log, do something, log.
00:33:02.520 The second thing is if you don’t properly choose the tools, you might end up with a pricey bill.
00:33:09.600 If you log extensively under heavy traffic, your Sumo Logic bill might increase.
00:33:15.840 Think about it; maybe it’s better to store them in a database. Maybe a simple Datadog matrix would be sufficient.
00:33:24.180 Pick the right tools for the job. I also recommend using this with legacy code.
00:33:30.600 It actually helped me a few times with understanding and debugging the code.
00:33:37.680 When you have a big ball of mud and you add logging to the mix, you will still have a big pile of mud.
00:33:45.600 But you might understand it a tiny bit better.
00:33:52.680 So I recommend you to try it. Okay.
00:33:58.560 Now, back to the original question: how to test randomness?
00:34:02.400 In my opinion, don’t test randomness. Try to understand it.
00:34:08.520 Try to extract it, and at the very end, try to control it.
00:34:12.720 Yeah, you might think there is no randomness in your systems.
00:34:19.680 And you might be right because why wouldn’t it be?
00:34:25.200 But there’s one thing that looks very similar to random things.
00:34:31.800 Maybe some of you have seen a piece of code like that somewhere in the system, like time current.
00:34:38.520 I bet if we all right now run race console on our MacBooks, we would all get different results.
00:34:46.440 It kind of looks like it’s a random thing.
00:34:53.520 Although, let’s be real here: it’s not that complex.
00:34:58.560 This code looks pretty simple, so here is my favorite solution.
00:35:05.160 How should we test it? Time travel!
00:35:09.120 Yeah, that’s the only option to test this fair and straightforward piece of code.
00:35:15.840 We need to use time traveling.
00:35:21.720 Okay, that’s it from my side.
00:35:27.720 But I have one more question or maybe one more favor to ask you.
00:35:33.480 This QR code leads to a feedback form.
00:35:40.320 There’s a slight chance that I might give this talk sometime in the future.
00:35:46.680 So I would really appreciate it if you could give me feedback. It’s really quick.
00:35:54.720 You just open the form, type in, "I hated it" because this is the important stuff for me.
00:36:00.600 Okay, thanks. That’s all from my side.
00:36:07.080 If you have any questions, I will be happy to answer.
00:36:18.480 I just want to challenge your statement: the last statement that time travel is the only test.
00:36:24.600 To test the time current or time now, I know at least two other battles.
00:36:30.240 One is effects like algebraic effects, and the other is dependency injection.
00:36:37.680 You can have like a single current time initialized per request or per whatever process you have and inject it down to the system.
00:36:44.520 Yeah, I mean, I 100% agree with you. This is probably much better.
00:36:51.600 You can use stubs or whatever, like probably even some logic I show here.
00:36:57.720 I agree, and yet I still sometimes see a piece of code like that in our system.
00:37:05.880 So, I agree you’re right; you can test it in many better ways.
00:37:12.480 So yeah.
00:37:18.000 Any more questions? All right, thank you very much.
00:37:30.840 Okay, so...