Talks

Crystalball: predicting test failures

Tests are often slow and are a bottleneck in development. We build complex CI systems with heavy parallelization to reduce the amount of time it takes to run all the tests, but we still tend to run all of them even for the slightest change in code. What if instead, we could run only the tests that might fail as a result of our changes? In a world of static languages, it's not hard, but Ruby is very flexible and dynamic language so implementing this idea is tricky.

Meet Crystalball (https://toptal.github.io/crystalball/) - a regression test selection library we built in Toptal. I'll demonstrate how to use it with RSpec, how it predicts what tests to run and how it can be extended.

RubyKaigi 2019 https://rubykaigi.org/2019/presentations/p0deje.html#apr19

RubyKaigi 2019

00:00:00.030 Hi everyone! Thank you very much for coming here today. It is a big honor for me to speak at RubyKaigi and be in front of you.
00:00:08.960 My name is Alex, and I am a software testing engineer. Although I program a lot, I am not primarily a programmer.
00:00:15.690 I have been using Ruby for the last eight years and have worked for the past seven years at a company called Toptal as a quality architect.
00:00:24.150 You can find me on Twitter, but I rarely tweet. If you have heard of me, it's likely because of my open-source work.
00:00:30.000 I am committed to Selenium projects. Anyone here knows Selenium?
00:00:35.010 A few people know it, great! I'm a maintainer of the Selenium WebDriver Ruby gem, so if you use it or, for example, if you use Capybara, you are actually using some of my work.
00:00:40.620 I am also a core developer of the Watir project, which is a browser automation tool in Ruby. If you've heard of it, great! If not, and you are interested, please feel free to approach me and say hi.
00:00:53.070 An announcement is that yesterday we tagged a new major version of Selenium, version 4 alpha one. If you use it, that is some big news.
00:01:04.439 There is a parallel conference, RailsConf, happening today in Tokyo. However, today we are going to talk about something completely different—Crystalball.
00:01:16.799 Crystalball is a tool that predicts test failures. We will start by discussing the problems with regression testing that currently exist, then I will introduce you to Crystalball as a tool that might solve some of those problems.
00:01:30.960 Finally, we will have a demonstration of how it works. So, let's start: how many of you write tests?
00:01:44.430 Okay, that’s good! In many programming languages, we have tools like static analysis and type checkers that help us code faster and ensure we haven’t broken anything.
00:01:58.260 In Ruby, we don’t have those yet, but as we know, we will eventually.
00:02:06.450 Tests are essential for any successful and faster software development process. We write tests to ensure we haven’t broken anything after changing the code.
00:02:17.510 However, as we create more tests, they begin to slow down. If you have ever worked with a large Rails codebase, you probably have tests that run for over one minute, five minutes, or even longer.
00:02:30.060 If you have a test suite that runs longer than one minute, that’s typically more people than those who write tests.
00:02:43.350 Tests are often slow, and what is even worse is that, quite often, they tend to be integrated.
00:02:51.690 By integrated, I mean that you have a specific component with tests, but then there are different tests that trigger that component indirectly.
00:03:02.010 We don’t always stub out the dependencies of various components, which leads to implicit usage across tests.
00:03:10.590 If you've ever encountered a situation where you changed a class and modified its tests, only to find other tests failing because they needed updates too, that’s exactly the problem of integrated tests.
00:03:19.440 As a result, we need to run all tests on every change. We change only a small part of our application but may have to validate all tests to ensure nothing is broken.
00:03:31.230 This is very painful — we change just a bit of code, yet we have to wait a long time to verify that nothing is broken.
00:03:43.110 We also know that tests can be quite frustrating. I don’t know if anyone actually enjoys writing tests, but, for good or ill, we need them in Ruby.
00:03:50.280 The good news, however, is that the Ruby community is an amazing place. A few years ago, Aaron Patterson wrote a blog post called "Predicting Test Failures".
00:04:03.349 In this post, he amusingly states, 'Running tests is the worst. Seriously, it takes forever and, by the time they're all done, I forget what I was doing.'
00:04:12.660 He proposed a sort of algorithm that would allow us to predict which tests might fail based on changes in the Git repository.
00:04:24.030 A couple of years later, Pavel Šuštin, an amazing engineer and my former colleague, created a proof of concept for Aaron's solution.
00:04:35.340 He developed a Ruby gem that understands the changes made in a Git repository and predicts which tests should run based on these changes. It worked exceptionally well, so we invested more time into it.
00:04:53.070 A year later, Crystalball 0.5 was released publicly and made available on RubyGems. The documentation prototype and everything were out, so we could start using it. This all started back with Aaron's blog post.
00:05:10.830 So, what is Crystalball? It is a regression test selection library, meaning it can select the tests that need to be run to make sure nothing has been broken.
00:05:23.770 Throughout this presentation, we will demonstrate examples using a simple Rails and RSpec application. We start by requiring Crystalball at the top of our spec helper.
00:05:35.240 If RSpec is run with the Crystalball environment variable set, we begin gathering information that will be used by Crystalball later.
00:05:44.400 We run all our specs with Crystalball set to true to create an 'execution map.' This execution map is a YAML file, and its most important part is just a hash.
00:05:54.110 Here, the key is a particular test, and the value is an array that lists all the Ruby files used by that test.
00:06:05.860 Next, let's say we start changing our app. For instance, if we change a user model to alter some validation line, we can run Crystalball as if we were running RSpec.
00:06:19.090 It will perform its magic to build predictions and run some tests, and in this example, one of those tests failed, which prompts us to dive deeper.
00:06:35.390 Essentially, Crystalball has three parts. The first part is the map generator, which generates a map from test to code.
00:06:41.700 It identifies which tests use which code. Based on that map, Crystalball predicts which tests need to run based on changes made in the Git repository.
00:07:01.390 Finally, it runs the necessary tests. The first layer, the map generator, is the most complicated and essential part of Crystalball.
00:07:15.570 This layer helps understand which code is utilized by which tests and incorporates various strategies to provide great results.
00:07:29.440 One of the strategies used is called coverage, proposed by Aaron in his original blog post. It works by using the Coverage API provided by Ruby’s standard library.
00:07:39.320 We take the coverage before the tests are run, gather coverage after the tests, and then see if the coverage has changed.
00:07:53.590 In the code, it looks like this: we require coverage, start it, get results before running the tests, and then take the coverage after running the tests to identify any differences.
00:08:02.510 Having this map generator in Crystalball can produce similar results. For example, if we have a single user spec, which uses four different files.
00:08:15.140 That spec file could involve the user model, user mailer, application record, etc. This coverage strategy is quite fast, reliable, and generally works well.
00:08:27.590 However, Crystalball also provides multiple other strategies for generating more accurate maps.
00:08:37.890 The next strategy is called Allocated Objects. It relies on the TracePoint API to gather constant definitions.
00:08:48.390 It collects all object allocations created during the test and maps them back to the Ruby files where these class constants are defined.
00:09:00.620 To give a brief overview, we first trace to gather constant definitions, then observe allocations, and finally map back to the files.
00:09:10.600 By using this for a specific test, we can see which files were allocated. In this particular example, there were actually only two files involved.
00:09:21.410 Specifically, the user model and application record were utilized, as Crystalball takes into account any inheritances.
00:09:30.900 Allocation Objects may be slower than coverage due to the trace point usage, but it provides great results.
00:09:42.590 Another strategy available on demand for testing is the Sparser Gem. This tool is very useful as it hooks into static method calls, such as class methods or module methods.
00:09:57.560 This functionality helps by parsing source code to search for constant definitions, running the tests, and subsequently retrieving the necessary files.
00:10:07.640 The process involves parsing the Ruby files for abstract syntax trees, recursively mapping through them and searching for constant calls.
00:10:20.080 This means we can determine what is defined and which methods are called, leading us to the correct results.
00:10:34.070 Running this Sparser strategy for the same test provides similar results to the previous methods, but it yields additional information we can leverage.
00:10:49.180 For instance, it also identifies the UserJob, which connects back to our User class method calls, ensuring if that file changes, we repeat relevant checks for the User model.
00:11:04.420 These various strategies can be adapted for any Ruby app or gem, and Crystalball offers even more tailored approaches.
00:11:14.540 One method pertains to Factory Bot, which we utilize heavily in our company. It tracks which tests to run when factories change.
00:11:30.730 Crystalball patches Factory Bot to gather information about updated factories, tracking their usage during tests.
00:11:46.708 Another method revolves around ActionView information, allowing us to discern which tests are required when views and partials are modified.
00:12:06.680 The mechanism uses markup hashes to collect data about compiled views, triggering tests accordingly whenever changes are detected.
00:12:21.480 Internationalization features are also catered for, with strategies to track when translations are loaded and accessed during tests.
00:12:35.980 Moreover, there is a specialized Table Map generator designed specifically for Rails apps.
00:12:51.590 It helps ascertain which tables relate to which models, especially valuable when working with database migrations.
00:13:06.320 Again, it uses trace points to search for definitions and collect table names from the models interacted with.
00:13:21.830 The predictor component of Crystalball is responsible for understanding the changes made in the repository and determining what tests to run.
00:13:34.720 Similar to map generators, there are multiple predictors in Crystalball, and you can even create custom ones easily.
00:13:47.170 The first predictor focuses on modified execution paths, gathering all modified files from the Git repository.
00:14:02.100 With the pre-created map, it will identify which tests need to be run based on the modified files.
00:14:17.180 For example, if a user model change occurs, the predictor would inform you to run tests on the associated user model spec.
00:14:30.800 The second predictor is straightforward: if a spec file itself has been altered, then it simply runs that specific file.
00:14:43.360 The modified support specs predictor identifies changes in support files, such as shared examples or contexts, and runs the relevant tests.
00:14:56.060 Finally, the modified schema predictor analyzes schema diffs, identifying changes to database tables and running the appropriate tests.
00:15:12.700 All these functionalities work together in an intuitive manner, especially beneficial when writing numerous migrations.
00:15:26.370 The runner component uses RSpec, executing the predictions and running only those specified tests.
00:15:39.600 It is not difficult to implement additional runners for tools like Minitest or Cucumber.
00:15:50.400 For the moment, however, Crystalball ships solely with RSpec, as that's predominantly what we use in our codebase.
00:16:02.170 Now it’s time for a demo! Let’s take a look at a simple Rails app I created based on the Everyday Rails book.
00:16:12.420 I modified my spec helper to require Crystalball and all the necessary strategies to gather information.
00:16:23.700 When I run RSpec with Crystalball set to true, it starts generating YAML files containing the maps.
00:16:38.700 It executes system tests quite rapidly; in this case, we have seventy different tests.
00:16:49.890 After running Crystalball without any changes, it won’t yield any results since nothing has been altered.
00:17:01.480 Let's modify the user model by removing a validation and then run Crystalball.
00:17:15.060 It will begin processing and will especially highlight tests that should be run following the changes.
00:17:29.610 Next, after running the tests, we find that 55 tests were executed and one test failed due to the change in the user model.
00:17:40.670 Let's change a 'show' view for projects by removing the owner field and rerun Crystalball.
00:17:51.880 As expected, this action leads to 5 tests being run, and one fails due to the missing owner.
00:18:01.890 Next, let’s alter the translation file related to sign-up and execute Crystalball to observe the impact.
00:18:14.760 By predicting relevant actions, it runs 12 tests, and one fails in relation to the updated string.
00:18:27.730 Lastly, we demonstrate how to create a migration that adds a new column to the tasks table.
00:18:37.560 After running the migration and regenerating the test database, we can run Crystalball once more.
00:18:49.480 It will detect the change in the tasks table and indicate that 61 tests need to be run, all of which pass.
00:19:02.010 Crystalball has been around for a while, and it's available at the Toptal GitHub page for you to explore further.
00:19:19.090 You will find documentation, examples, and thorough explanations of everything regarding Crystalball.
00:19:31.340 If you, like many others, are suffering from slow tests, I encourage you to start using Crystalball. Thank you very much! My name is Alex, and I'm happy to answer any questions.
00:19:43.970 [Audience applause]
00:20:10.580 Are there any questions? No? Yes? No? Then, thank you!