00:00:10.160
Hello, my name is Nick Gauthier, and I'm here to talk about your test suite. To kick things off, I want to ask how many of you wish that your test suite ran faster? There we go, that's almost everybody. Some people didn’t raise their hands, so let me rephrase that: how many of you are very happy with how fast it runs and don’t want it to go any faster? It’s incidentally fast.
00:00:31.760
Okay, so some of you might be enjoying a divide-by-zero infinite task test suite. Continuous integration is great; working, testing, and then committing tested code always works really well. However, the problem occurs when we work for 30 minutes and then our tests take another 30 minutes. It's just not an efficient process. A lot of people think, "Hey, let's build a CI server to run our tests for us." CI servers are useful for replicating production environments, but often end up getting you out of sync with your actual testing process. This can lead to feedback loops that are still 30 to 40 minutes behind. Some people have told me that their feedback loops are a day behind, which can really cut into your productivity.
00:01:13.439
So instead, let’s work on making our tests faster. The project I am discussing today is a real production project that has been running for a couple of months now for a large company, and it's real code doing real work. You won't see any synthetic benchmarks here. It's running on Rails, and I know a couple of people may think, 'Oh no, not another Rails guy.' For the purpose of this talk, Rails is just a large Ruby application. I won't be showing anything specific to Rails or even Ruby; a lot of the concepts I discuss apply down to the kernel layer and file systems.
00:01:19.439
We are using FactoryGirl, which is fantastic for building objects. We’re also using Shutter, which is a layer on top of Test Unit, but the concepts I’ll discuss apply equally well to Test Unit, shoulda, RSpec, Cucumber, and even JSLint. For handling images on our site, we’re using Paperclip, which is a really great tool. Additionally, we do something called empty database testing. This means that when we load our test environment, we don’t have fixtures; we have an empty database, and all our tests are responsible for creating their own scenarios to run.
00:01:32.640
To start, I will be presenting a lot of benchmarks. I wanted to share that I ran my benchmarks on a 2.4 GHz quad-core machine with four gigs of RAM and a standard platter hard disk. I actually did these benchmarks all in one day on one Git commit, consolidating all the tips and ideas I had. These stats are solid; I didn’t run them over a couple of months while the code was changing. The project wasn’t huge; we developed it in about two months, totaling four thousand lines of tests and three thousand lines of code. If you wanted to benchmark your own test suite and you had around eight thousand lines of code, you could look at mine, double it, and see where I’m at versus where you're at.
00:02:39.920
Let’s talk about the vanilla test suite. Before I made any enhancements, it was taking 13 minutes and 15 seconds to run, and at this point, I want you to participate a bit more. We’re going to play The Price is Right! As I go through this presentation, we’ll be knocking that test suite time down, and I want you to think about how fast you think I can get this test suite to run. Write it down if you really want to set it in stone. Maybe tweet it, and then later on, you can see how good your prediction was. Each time we get a new number, we’ll see who’s still in the game.
00:04:11.480
So, regarding the total tests, I’m not sure exactly how many we had, but I think we had near a thousand assertions. At this point, everyone should have guessed something lower than 13 minutes and 15 seconds. Did anyone get knocked out already? Alright, I figured I might as well go for the low-hanging fruit first.
00:04:29.199
Let me introduce you to the concept of Fast Context. This involves running a setup block and, in this case, we have two should blocks executing within that context. Both assertions run after that setup is run. This is similar to before_all in RSpec or before blocks in Cucumber. The idea is that you are setting up something, performing some tests, and then setting it up again for other tests. With Fast Context, it’s as simple as dropping it right on the front to say Fast Context and that’s it. This eliminates the need to execute the setup twice, saving valuable time.
00:05:19.760
It’s important to note that you should have side-effect-free should blocks. This is a good practice in general. Perform your work in the setup and check it out in your should block with assertions. This guarantees that you can even alias context to Fast Context and still be okay. This is the only major warning I’ve encountered so far because this does change how your tests work, but if your should blocks are side-effect free, you shouldn’t see any changes to your coverage. As a quick overview, by implementing Fast Context, I've successfully brought the test time down to 5 minutes and 32 seconds.
00:06:07.680
The reason for this time is that we had a lot of heavy functional tests, which are structured with a GET request and several assertions afterward. By merging these heavy GET requests into a more efficient single task, we are saving significant time. So at this point, I’m curious—who’s still participating in the game? I hope I haven’t made it too easy! Let’s make it prettier.
00:07:19.760
Next, let’s talk about Paperclip. Paperclip relies on ImageMagick, which is an incredible tool but, like many powerful tools, it can be slow. When requesting geometry from a file to determine its dimensions, Paperclip actually shells out to the Identify command. Not only are you running ImageMagick, but you're also shelling to it. When you create a thumbnail, you shell out to the Convert command too. While running these tests, we have models that require images, and as such, we are creating all these images repeatedly.
00:07:56.479
I discovered this issue in my testing suite through a previous talk by Aman Gupta about debugging testing tools. I implemented a simple mock monkey patch called QuickerClip. When you want another dimension of a file, it delivers a mock dimension of 100 by 100. Additionally, when creating a thumbnail, I’ll copy my fixture file where it’s expected to be, ensuring a real thumbnail is present on the file system but without needing to run it dramatically. After this implementation, our test time dropped to 3 minutes and 4 seconds—still under 1 minute.
00:09:01.760
We can definitely do better! My benchmarks revealed that the quad-core machine wasn’t fully utilized; one core was handling all the testing while three cores were available for me to work with. So let's discuss multi-core testing. Existing solutions include ParallelSpecs, Tickle, and DeepTest. However, I found certain limitations with some of these tools. For example, if test files are not grouped properly, you end up with unequal load distribution amongst your cores. A better approach should balance the workload across all available cores.
00:13:36.399
I sought out DeepTest, an option developed by ThoughtWorks. This tool is powerful but has a difficult setup process requiring multiple databases and remote demons. My focus was to create something more useful for daily usage without the cumbersome setup. I introduced Hydra, designed to work with Test Unit, Cucumber, RSpec, and JSLint, simplifying the process immensely. Hydra implements load balancing by distributing the workload to each worker based on their current processing capabilities, running the slowest tests first.
00:14:51.679
Making use of only three lines of YAML to configure Hydra for my machine, I've ensured I can run everything efficiently across all cores. The environment loading issue also had considerable impact; multiple loads across various frameworks caused unnecessary delays. By utilizing Hydra, I achieved a remarkable reduction in these loads, shown by the test time dropping to an impressive 1 minute and 26 seconds on a quad-core machine.
00:16:33.520
After implementing all these strategies, my overall test time went from 13 minutes and 15 seconds down to 18 seconds, resulting in a 44.17 times speed-up without any modifications to my application's core code. It’s crucial to note that I didn’t alter my test coverage, assuming that side effect free context is maintained, and I’m still performing database actions.
00:18:17.680
We all desire to be more productive, and accepting the incredible improvement made through the implementation of these methods is essential. However, I also want to emphasize that anyone can achieve these results. Many enhancements are readily available that can improve test suite performance. Maintaining a tight feedback loop is going to benefit the quality of our software projects immensely, making the effort to optimize testing processes well worth the time.
00:20:06.639
If there are specific questions regarding the practices and adjustments discussed, feel free to ask! To those of you on single-core machines, I recommend still trying these approaches. By eliminating environment loads with Hydra, you will see improvements, even with some of the initial difficulties present. For those looking to optimize further, consider working up through the presented ideas and engaging with various tools to find the best fit for your process. We’re all in this together; let’s collaborate and make our testing suites as efficient as possible.