Talks

Testing the Untestable

Good tests are isolated, they're repeatable, they're deterministic. Good tests don't touch the network and are flexible when it comes to change. Bad tests are all of the above and more. Bad tests are no tests at all: which is where I found myself with a 5 year legacy codebase running in production and touching millions of customers with minimal use-case documentation. We'll cover this experience and several like it while digging into how to go from zero to total test coverage as painlessly as possible. You will learn how to stay sane in the face of insane testing conditions, and how to use these tests to deconstruct a monolith app. When life gives you a big ball of mud, write a big ball of tests.

Ancient City Ruby 2014

00:00:00.120 Welcome everyone, my name is Richard Schneeman, or Schneems on the Internet. A fun fact about me: I am literally married to Ruby.
00:00:07.140 So I'm not allowed to see other programming languages.
00:00:13.139 It's interesting because my wife recently got a job as a Python programmer. I come from the great state, or rather the great nation, of Texas, specifically Austin.
00:00:24.539 Just a little bit about my background: I'm actually not a classically trained programmer. I graduated from Georgia Tech with a degree in mechanical engineering.
00:00:36.059 While studying mechanical engineering, I focused a lot on introspection, and one of my favorite subjects was thermodynamics.
00:00:47.160 Thermodynamics is the study of heat transfer and how heat moves and flows. I found that when I did a lot of work, at the end of the process, I would either have a clear correct or incorrect answer.
00:01:06.840 For instance, have you ever been in a test where you wrote negative a thousand, but it turns out the right answer is positive a thousand? You realize you’re really, really far from being right.
00:01:20.700 In school, there’s always someone to tell you what is right or wrong. This brought me to question: what happens in the real world? With that in mind, I began one of my first co-op jobs designing and building refrigerators for General Electric.
00:01:37.259 It's worth noting that this is a multi-billion-dollar corporation, and to design these refrigerators, they used the most sophisticated software that money can buy: spreadsheets.
00:01:50.220 Everything was just spreadsheets upon spreadsheets. I thought, 'Okay, we're building refrigerators, and we use spreadsheets. What if the spreadsheet is actually wrong?'
00:02:01.920 This sparked a discussion with some colleagues, and that’s when it dawned on me: testing! It’s genius! We build this spreadsheet, predict how the refrigerator will behave, and then we can actually test it.
00:02:24.000 You put the refrigerator in a room with controlled temperature, use thermocouples, and it will tell you if the refrigerator behaves as you predicted.
00:02:35.459 I believe that compared to building and testing refrigerators, programmers are incredibly lucky. We have generally known inputs and outputs, and the best part is that our product is a program.
00:02:48.120 To test it, we write programs. That's like testing refrigerators with mini refrigerators. Not everyone, however, has such good fortune.
00:03:07.080 Back in the 1960s, we made a plan to go to the moon, which was a very ambitious undertaking.
00:03:12.400 If you don't know, outer space is a somewhat inhospitable environment. It's a lifeless vacuum, and we were sending people there. You definitely don't want to make a by-one mistake in that scenario.
00:03:26.580 In order to trust the calculations involved, NASA had to think carefully. They broke up the rocket into components to test each one and quickly identify any failures.
00:03:43.860 They used test fixtures, which you might not find in your Rails app, but allow segregation and one-by-one testing.
00:04:02.040 You could test the engine in isolation, but eventually, you will have to do an integration test when you put everything together. There was a famous interview with Alan Shepard where he noted that he was sitting on top of a rocket made by the lowest bidder.
00:04:15.239 At the end of the day, to determine whether a rocket works, you have to launch it. So they did launch rockets, which led to integration tests—if you attended a previous talk, you might refer to this as a safe test.
00:04:39.660 This brings me back to software. To give you more about my background, I work at a company called Heroku.
00:04:51.540 At Heroku, I am part of a Ruby task force, where we often have Ruby task force meetings.
00:05:05.820 Let's have some audience participation now. I want you to guess the year associated with these images I'm about to show you.
00:05:16.919 This here is from 2007. Originally, Heroku was designed with a single purpose: making it incredibly easy to deploy Rails apps. Over time, we've expanded to a variety of languages and frameworks, even allowing arbitrary Ruby code to run.
00:06:07.020 To accommodate different languages, we introduced the concept of a build pack. If you're not familiar, a build pack is a piece of code that determines the type of project being deployed—whether it's Ruby, Node, Python, Scala, or others.
00:06:34.620 The Ruby build pack is my area of focus. It primarily runs bundle install, but it's so much more than that! Currently, it encompasses about 4573 lines of code, accounting for various edge cases regarding different Ruby and Rails versions.
00:06:55.020 As of January 2013, it had exactly zero tests—not even plus or minus. This was an open-source project; very little history or knowledge came with it.
00:07:16.740 Before you freak out: it was tested, but only manually, with platform tests checking if Heroku was working as a whole.
00:07:36.180 Despite my lack of historical knowledge and without knowing if pushing my changes would hit any edge cases, I took several steps back and began to formulate a game plan.
00:07:54.600 Has anyone heard of an MVP? Does anyone know what this stands for?
00:08:04.320 Okay, I don’t think I heard anyone say "minimum viable patch". To explain, a minimum viable patch is a change meant to solve a problem as quickly and easily as possible without affecting the rest of the system.
00:08:53.880 The downside of this approach is that too many minimum viable patches lead to difficult-to-maintain code. Does anyone have guesses for a potential remedy for an MVP?
00:09:40.440 Refactoring is definitely one answer, but there’s also testing.
00:09:53.880 Has anyone ever read the book "Working Effectively with Legacy Code"? The author, Michael Feathers, discusses black-box testing, where you take inputs, feed them into your program, and analyze the output to determine if it was successful.
00:10:12.180 In this scenario, we ignore how everything works, focusing only on inputs and outputs.
00:10:26.460 Traditionally, it was advised to never use network dependencies for testing, but in our situation, we don't have that option. We need real apps, real deployments, and to effectively exercise our system.
00:10:56.580 To address this, I developed a framework called Hatchet. It allows you to build and deploy Heroku apps. It clones a repository, creates a new Heroku app, and deploys it, producing a deploy log.
00:11:36.840 Has anyone used "Heroku run"? Or "Heroku run bash"? It’s fantastic; it spins up a new dyno, placing you in the shell where you can run commands securely.
00:12:03.540 When querying versions of Ruby, who here is using 2.1? Raise your hand. Hopefully, it’s not too many hands; I hope everyone is using 2.1.
00:12:30.360 I have something to share with PHP users as well, so anyone using PHP, feel free to reach out to me after the talk.
00:12:49.680 Returning to Hatchet, I created software that programmatically drives Rails console scenarios. It runs commands and provides predictable outputs, allowing validation of the deployed code.
00:13:18.720 The library manages session control, though I encountered some issues with process deadlock during development, leading to a lot of debugging and log checking.
00:13:56.460 Using this approach allows us to interactively test applications we just deployed, which is incredibly useful.
00:14:14.160 For a practical case scenario, in my previous example, we used a Rails 3 app and maintained a GitHub repository called Sharpstone that contains various edge cases.
00:14:45.840 Currently, we have about 47 different repositories cataloging edge cases related to Ruby, Rails, and other configurations.
00:15:11.760 On a side note, who here has ever written 'thread.new'? It's quite fun! I once created a library over a weekend after reading Jesse Stomier’s book.
00:15:35.100 The library is called 'Threaded' and primarily features a promise interface, enabling you to schedule tasks asynchronously.
00:16:06.300 Originally, Hatchet tested by cloning repositories one by one, which took a considerable amount of time. By integrating a threaded approach, I reduced this time to about 2 seconds.
00:16:44.520 Once the app is cloned, it’s moved to a temporary directory, a new app is created through the API, and then we deploy it again with assertions based on previous deployment outputs.
00:17:07.260 Following the principles discussed in Michael Feathers' book, we were completing black box testing, but we had yet to address scenarios where components might fail.
00:17:39.060 What happens when services like S3 or RubyGems go down, or if the Heroku API glitches?
00:18:02.760 In those moments, your tests fail, leading to frustration because those failures are not indicative of your code being wrong.
00:18:29.580 This is unacceptable in software development. To combat this, I created a gem called ReRetry, which simply retries deployments if a failure occurs.
00:19:00.420 Most deployments will attempt to retry if it’s not marked as a known failure, ensuring we primarily focus on the positive outcome.
00:19:33.780 On a side note, everyone should upgrade to Bundler 1.5, which features parallel installation and automatic retries during gem installations.
00:19:54.060 We are exploring other potential network hiccups, especially during operations like 'Heroku run bash' where we may need to access the network.
00:20:18.420 For those scenarios, we employed a library in our spec to automatically rerun tests if they fail sequentially; that’s a way of addressing non-deterministic conditions.
00:20:57.840 This approach mitigates non-determinism by mimicking some form of these hiccups, using probability to approximate determinism.
00:21:17.460 All this testing enhances our iteration cycle and speeds up our workflow, even if the tests take some time to complete.
00:21:57.540 Initially, tests took about five minutes each, but we’ve managed to reduce the time to about 12 minutes for 44 test cases thanks to a parallel test runner.
00:22:32.100 Aside from the benefits of faster build packs, increased testing efficiency allowed for major architecture changes.
00:23:00.840 These big changes resulted in significant speed improvements, enhancing our capability to refactor aggressively.
00:23:25.440 Tests often result in more tests. As we wrote these integration-style tests for previously untestable features, we could start breaking down the larger test suite.
00:23:50.520 Instead of running long tests, we modularized smaller components, enabling us to run tests much faster. This reduced the time needed significantly.
00:24:13.680 For instance, running a Ruby app takes about 30 seconds to deploy, but a unit test validating rake integration runs in roughly 1.63 seconds.
00:24:34.560 Faster tests translate to quicker integration, allowing teams to implement changes and deploy with greater efficiency.
00:25:04.560 I also work with a library called CodeTriage, which facilitates contributors to receive one GitHub issue per day from their repository of choice.
00:25:30.990 Even though CodeTriage interacts with GitHub’s API, we still test it effectively by interacting from a user perspective.
00:25:49.620 While there’s a lot of work involved, we can effectively test interactions without communicating with the external network.
00:26:39.780 For this, we've used WebMock and another gem called VCR, which allows you to record networks and replay them during tests.
00:27:23.520 Through this mechanism, we exercise our applications like actual users but never hit GitHub directly unless necessary.
00:27:57.540 While changing APIs is outside our control, testing must accommodate potential fluctuations as they arise.
00:28:21.240 I’ve found immense value in the transition to Puma, a web server that allows for multi-threading and multiple processes.
00:28:56.520 To tackle the question of how many workers we need, I created a gem called 'Puma Auto-Tune' to optimize the number of workers based on available RAM.
00:29:35.060 This gem requires sufficient testing without simply running one formula, resulting in significant complexity due to interactions with RAM utilization.
00:30:13.900 To address this challenge, I've developed a testing class that runs Puma, logging every output and verifying that our expected conditions align with actual outcomes.
00:30:52.100 Each input and output scenario can be tested, making it possible to refine our approach adequately. If you’ve ever used Unicorn Worker Killer, you might see similarities with what I've developed.
00:31:17.820 At the end of the day, nothing is untestable, which kind of makes my talk's title total clickbait.
00:31:43.260 If you don’t know where to start with writing tests on a monolithic system, begin with integration tests.
00:32:06.300 Integration tests will guide you in identifying potential pain points within your code. Once, while working at Gowalla, our user sign-up process broke for three days.
00:32:43.620 This is crucial because we realized we must test what would lead to real-world pain if not functioning correctly.
00:33:03.480 Ultimately, be proactive in covering edge cases; avoid minimum viable patches, strive for maintainability and flexibility in your code.
00:33:10.380 My name is Schneems. I have developed libraries like Sextant, Wicked, and co-authored a book called 'Heroku: Up and Running.'
00:33:16.919 I have a copy here, and the best question will earn it. Does anyone have any questions?
00:33:28.860 Well, let’s clap first, then we can do the questions. Thank you!