Your test suite is making too many database calls!

by Joël Quenneville

This talk, presented by Joël Quenneville at RailsConf 2022, focuses on the common issue of test suites making excessive database calls, which leads to slower tests and longer feedback loops in the development process. To improve the performance of a test suite, it's crucial to identify and eliminate unnecessary database queries caused by test setup, support code, and production code. Key points discussed include:

In conclusion, improving test performance involves careful attention to how data is created in tests, optimizing setups, minimizing unnecessary queries, and leveraging profiling tools to identify hot spots. This both enhances team productivity and fosters a more efficient development process.

00:00:00.900 Hello, everyone! I think this room must be the unofficial testing track.

00:00:03.800 Audrey described a great talk prior to this session about TDD as a treasure map. For those of you who didn't catch it, I recommend checking it out online once the talks are available. Today, we're going to be discussing why your test suite is making too many database calls.

00:00:15.839 The key problem we're trying to solve is slow tests. Slow tests are a problem because longer feedback loops while writing code slow down our development process as we wait for tests to run. Longer feedback loops also slow down deployment. If you're dependent on CI, that could mean 30 minutes to 60 minutes of waiting for a feature to go out or, even worse, a bug fix. During that time, the site is down and your customers aren't happy.

00:00:38.160 Overall, you get lower value from your tests because they are not running as frequently, and you may not feel as confident in their results. Three things that are particularly expensive in tests are HTTP requests, anything involving a headless browser, and database queries. If you've spent some time in the Rails testing community, you’ve probably heard some familiar pieces of performance advice.

00:00:58.739 Firstly, disable real network calls. This can be done using tools like WebMock or VCR. The idea is that the best way to prevent slow tests from HTTP requests is to not make them at all. Secondly, you may often hear recommendations to organize your tests using the testing pyramid. In this architecture, you have a few expensive end-to-end tests that use costly headless browser requests, a medium number of integration tests that test subsystems, and a lot of cheap unit tests that validate individual objects.

00:01:26.420 Finally, avoid persisting data when you don't need it. For example, if you have a test that verifies the full name method on a user will concatenate the first and last name, you don’t need to go to the database for that. Instead of creating a user in the database, you might just create one in memory and assert its properties that way. This is common advice, and if you're not already doing this, you probably want to consider it to start seeing performance benefits.

00:01:57.380 However, today, I want to go deeper into the problem of database queries in your tests. Specifically, we’ll look at the issue of setting up more data than you actually need, which leads to making unnecessary inserts or updates to the database.

00:02:09.840 It's worth noting that queries aren't inherently expensive, so if you have one extra query in your test suite, it's generally fine, and you may not notice it. The problem arises when you have tens of thousands of accumulated queries over the course of a whole suite. That’s when you start to see a significant difference in performance.

00:02:44.760 The problem can look something like this. Here, I have an object graph. This differs slightly from the class diagrams you might be familiar with, which show how different classes interact. An object graph illustrates each individual instance within a particular system. If there are two instances of the same class, both will appear here. This is important for the problem we are discussing today, as we want to identify when more than one instance is being written to the database.

00:03:09.780 In this example, we have a unit test for a single object in the system under test. However, we inadvertently create a couple of collaborators and even a secondary collaborator. On the edge, there’s a random extra object with another random collaborator. If all of these result in writes to the database, we end up with six writes when we only needed one. Multiply that across an entire test suite, and you start to see the impact.

00:03:35.640 You might say, 'I don't do unit tests; testing in isolation with mocks seems brittle. I only use integration tests, so I don't need to worry about this problem.' Unfortunately, I have bad news for you. You'll face the same issue but on a larger scale. Now you're testing an entire subsystem or even the whole system using end-to-end tests, leading to the same problem of unnecessary collaborators and additional records being created in the database.

00:04:03.960 So how did this happen? Nobody sets out to create unnecessary database load during their test runs. Today, we're going to delve into three locations in your code base where we tend to accidentally create more data than intended: in your test code, support code, and within the actual code being tested. Let’s start by discussing ways in which your test code can lead to excessive database calls.

00:04:34.200 A common culprit here is shared test setup. When you have test one and test two that require similar but perhaps not identical data setups, you might think you're being efficient by extracting shared setups into a series of lets or a before block. However, this often leads to creating all the data necessary for both tests every time they run, effectively creating more data than you originally intended.

00:05:05.640 Here's an example. We have a test at the top that only requires an organization and a test at the bottom that requires an organization with two users. Due to the way our lets are structured, when we run the test that only needs the organization, it will also create two users. As a result, we end up with three queries executed instead of just one.

00:05:28.320 In an object graph, this might look like this: I’ve shaded in orange the data we actually need for our test, which is the organization, and then we inadvertently created those extra users that we didn’t want. While two additional objects isn't catastrophic, the lack of scaling implies that as the complexity of your tests increases, this issue could exacerbate considerably.

00:05:48.480 Consider a more complex scenario with four tests where each test requires two records. Although there’s some sharing between the tests, none of them utilize the same two records, and across all of them, there are only six unique records. We end up extracting all of them into shared setup, leading to the creation of all six records for every single test. That results in 24 insert queries where only eight were truly necessary.

00:06:15.840 It gets worse from there. Shared test setups can also lead us to create data that is not only superfluous but sometimes of the wrong type for our tests. This might happen when you want to create an average case that could work for most tests, but not all tests actually require that data. For example, in one test, we may need an admin user with a contact, while in another, we only need a regular user without needing any contact.

00:06:37.020 In our setup, the test requiring a regular user has to create a regular user, an admin user, and a contact, which means it has to undo that admin flag and set it back to false later—resulting in two create queries and one update, even though we only needed one create. Anytime you see an update in a test setup, it’s a significant code smell. It typically indicates something went wrong with the setup, leading to dependencies that aren't correctly shaped.

00:07:07.680 The recommended solution is to move your setup inline. While this may seem more verbose, it allows each test to only receive the data it actually needs. Now, those of you well-versed in RSpec might be thinking that using let does not create all that extra data, thus negating the need for inline setup. However, in practice, lets might become entangled, causing all the data to be initialized.

00:07:38.820 It is indeed possible to write your lets in such a way that they remain decoupled, but achieving that requires significant discipline from your team, which is often difficult to maintain. Hence, my recommendation is to do your setup inline, preventing a whole class of errors and ensuring better scalability as your team grows.

00:08:07.920 Now that we've explored how to avoid making these mistakes in your test code, let's discuss how to identify existing issues within your code base. This is challenging with tests since each small test does not generate significant amounts of extra data; it’s the aggregate behavior that matters. Thus, it may require a gradual approach to refactoring your code base, rather than trying to fix all tests at once.

00:08:37.860 Instead of trying to change all tests that do not generate much data, focus on finding tests that are slow and then profile them. This will target your efforts where you can gain the greatest benefit from changing how setup is performed. Some helpful tools include Rails logs, which track all queries to your test log when executing tests. You can run slow tests while tailing the test log and observe all the queries being logged in real-time.

00:09:07.800 You could be surprised to discover that a seemingly slow test is performing far more queries than expected, or conversely, that you might've anticipated too many queries, only to find there aren't as many as presumed. The SQL Tracker gem can also run within a block around specific test executions, providing valuable insights on how many queries are executed.

00:09:43.920 Now let's delve into ways your support code might lead to excessive database calls. A major offender in this area is factories that perform too much work. For instance, consider an organization factory that generates a list of three members for every organization it creates. This can pose challenges, as you may think a simple creation of an organization only generates one insert when, in reality, it triggers four due to the additional users.

00:10:10.020 The compounding effect is even worse, as expensive factories tend to create additional records unexpectedly. For example, a service object test that only attempts to initialize a user might inadvertently call an organization factory, leading to the creation of three more users without your knowledge.

00:10:31.020 To mitigate this, your base factory should be minimal. By minimal, I mean absolutely essential—if any line in your factory can be removed without causing issues with Rails validations or database constraints, that line should go. The only attributes present should be those necessary to successfully create the factory without errors.

00:10:57.780 For example, in a minimal organization factory, you would only ensure the presence of the name attribute required for validation. It's also beneficial to extend your base factory using sub-factories or traits, which allows you to include pre-packaged sets of attributes without cluttering your base structure or creating unwanted records.

00:11:23.880 So how might you find existing misuses within your factories? Factory definitions are a single point of change that can have a significant impact, but identifying which factories need modification can be tricky due to the sheer number of them. One useful technique is to leverage FactoryBot’s ActiveSupport notification event subscriptions.

00:11:56.220 By logging the invocations and durations of factories, you can discover which factories are consistently slow. Additionally, the test-prof gem comes with a factory profiling tool that can identify the slowest factories both individually and in aggregate, helping you pinpoint hotspots to scrutinize.

00:12:17.640 If you come across a slow factory but can’t figure out why, a great technique is to load the Rails console in test mode. Set the log level to debug, then create the factory in the console to observe all the log messages generated, which will reveal the queries executed when the factory is created.

00:12:42.180 Often, the number of queries generated can be quite surprising. Now let's transition to discussing how your source code may also be leading to excessive database calls. Active Record callbacks can complicate things, particularly if they create additional records without your intention.

00:13:07.260 For example, imagine an organization is set up to generate a new admin user on a before_create callback. If you supply your own admin user when creating the organization, you may wind up with two admins—the one created by the callback and the one you intended to create—causing potential data integrity issues.

00:13:35.520 The complex fallback where unexpected data persistence may occur can lead to tests passing amidst subtle failures that emerge later. It’s common to add a guard clause in your tests by creating a local admin and then updating it to ensure that your test setup overrides the callback’s output, but doing so is a classic code smell. It suggests an underlying problem within your implementation.

00:14:06.360 This situation can lead to unnecessary inserts and updates, resulting in a test setup that makes three inserts and one update when you only really needed to create two inserts. Finding existing misuses creating excessive data in callbacks can be particularly challenging due to their tight coupling with the surrounding code base.

00:14:38.760 However, when pain arises, it can signify hotspots in your code that need investigation. Many developers are aware of the pitfalls associated with certain callbacks, and addressing these issues will often lead to a significant impact across your code base.

00:15:01.080 In TDD, the underlying principle is that your tests should guide you towards necessary changes. If you encounter unit tests that require an excessive amount of setup, that may be a sign your objects are too coupled with others around them. Refactoring the architecture may help yield tests that require far less data to execute successfully.

00:15:27.060 Our final concern is to recognize places where we inadvertently create more data than we aim for. Let's walk through a case study where we analyzed a codebase to enhance test performance by eliminating extraneous creates.

00:15:58.560 On one project, I discovered a very slow test suite, intuitively knowing some factories were generating more data than we expected. This project had numerous factories, making it unclear where the problematic instances were. Initially, I utilized ActiveSupport notifications alongside the FactoryBot run factory event I previously mentioned to construct simple profiling that quantified both the time taken for each main factory and their frequency of invocation.

00:16:29.640 I subsequently visualized this data in a graph, with the vertical axis indicating factory invocation frequency and the horizontal axis representing the duration of individual executions. The size and color of the bubbles denoted the aggregate impact each factory had on the test suite.

00:16:52.920 One noticeable factory was towards the lower right corner, indicating a significant impact, albeit infrequently called, but nonetheless taking approximately four seconds to execute just for its setup. I tested this in the console and confirmed that it took four seconds to create this factory—an excessive duration for what should be a straightforward operation.

00:17:10.320 I executed the factory in debug mode within the console, allowing me to monitor the log output that generated as objects were created. The logs revealed an overwhelming amount of data being generated, indicating beyond reasonable doubt that something unusual was occurring.

00:17:30.120 To understand the relationships within the schema, I referred back to the concept of the object graph. My simplified version of what was happening illustrated that I was crafting an organization with various associated entities, indicating a recurring and convoluted pattern.

00:17:51.720 This was concerning, so I turned to my active record models and created a class diagram to clarify the intended data model. The diagram illustrated an organization managing a venue, staff, and events, which are run at that venue—all of which share the same organization.

00:18:11.940 An implicit assumption prevails where an event must belong to its corresponding organization, be it through venue or staff. If this assumption is ever broken, it could lead to unanticipated results. I then investigated the factory responsible for creating these relationships.

00:18:34.680 The factory initially created an organization along with its associated elements—a venue, staff, and several associated events. While they were previously linked through the venue, this connection did not extend to the staff. Consequently, an event factory would generate a new staff member, leading to infinitely nesting associations due to these misaligned connections.

00:19:00.540 The solution was simple—ensuring both the venue and the staff were associated through the same organization. After implementing this change, the object graph reflected only the relevant data our tests required, effectively eliminating any unnecessary creations.

00:19:28.320 Once the implementation was complete, we executed the test suite to measure if our changes had a measurable impact. It is essential to establish baseline metrics before committing any performance enhancements. After running the tests, we discovered a remarkable 15% increase in speed—evidence that minimizing extraneous data creates meaningful performance gains.

00:19:58.380 We learned several debugging techniques during this process. We utilized log outputs, profiling tools, and object diagrams to gain insight into the underlying issues. The root of the problem resided within our factories, and a single-line modification yielded significant improvements, demonstrating how factories can unexpectedly lead to excess data creation.

00:20:21.480 In conclusion, we discussed various scenarios where we may unintentionally generate more data than anticipated. Furthermore, we reviewed methods to keep that data creation minimal, such as keeping your test setups local, maintaining minimal factory bases, avoiding unnecessary data creation within Active Record callbacks, and minimizing tight coupling within your code.

00:20:51.720 This advice holds true for new code or when working with a Greenfield application. However, many scenarios require us to identify existing inefficiencies. Tools like diagramming, particularly object diagrams, and studying the logs can help visualize the unintentional data being created across different code segments.

00:21:18.660 Lastly, profiling tools can help pinpoint the hotspots deserving of your focus, further enhancing your debugging. Thank you for attending this talk. My name is Joël Quenneville, and you can find me on Twitter at [insert handle]. I work at Thoughtbot, and we are hiring, so feel free to approach me with any questions regarding testing in general or discussing controversial topics.

00:21:45.960 I am happy to engage in discussions about creating excessive data within factories and would love to hear your thoughts on the topic. Thank you once again for your attention!