Summarized using AI

Testing The Untestable

Richard Schneeman • February 20, 2014 • Earth • Talk

In the talk titled Testing The Untestable, Richard Schneeman shares insights into testing practices, especially within the context of legacy codebases in software development. He emphasizes the importance of good testing—isolatable, repeatable, and deterministic. Key points include:

  • Testing Background: Richard shares his experience transitioning from mechanical engineering to software development, highlighting the parallels between verifying fridge designs and software testing.
  • Defining Good vs. Bad Tests: He asserts that good tests do not depend on external systems, while bad tests produced in the context of a legacy codebase can be detrimental.
  • Legacy Challenges: He discusses the struggles with a five-year-old legacy codebase at Heroku that had initially zero tests. He outlines the potential chaos caused by minimum viable patches (MVPs) that lead to unmaintainable code over time.
  • Testing Framework Development: To tackle these challenges, he created 'Hatchet', a testing framework designed for deploying and testing Ruby applications on Heroku, which emphasizes real scenario testing over local simulations.
  • Integration Tests: Richard discusses the concept of testing components using integration tests, likening it to launching rockets into space to ensure that all individual parts work harmoniously in real-world conditions.
  • Retrying Tests: He shares insights into managing network-related failures during testing through automated retries, which helped alleviate issues stemming from external services.
  • Importance of Non-Deterministic Testing: He explains how his strategies lead to both efficient testing processes and the handling of non-deterministic outcomes caused by external dependencies.
  • Modular Testing: Schneeman emphasizes moving towards smaller, isolated tests as a means of enhancing speed and reliability. He advocates for using mocking tools like VCR to simulate network requests, reducing randomness during tests.

In conclusion, Richard asserts that nothing is untestable. With a structured approach to testing—starting from integration tests to modular unit tests—one can confidently ensure the reliability of a system. He encourages prioritizing robust tests to develop maintainable code and has urged against the pitfalls of relying on MVPs. He wraps up the talk by highlighting that comprehensive testing creates confidence for future deployments, ultimately advocating for flexible, efficient, and consistent testing practices in software development.

Testing The Untestable
Richard Schneeman • February 20, 2014 • Earth • Talk

Good tests are isolatable, repeatable and deterministic. Good tests don't touch the network and are flexible when it comes to change. Bad tests are all of the above. Bad tests are no tests at all - which is where I found myself with a 5 year legacy codebase running in production and touching millions of customers with minimal use-case documentation. We'll cover this experience and several like it while digging into how to go from zero to total test coverage as painlessly as possible. You will learn how to stay sane in the face of insane testing conditions and how to use these tests to deconstruct a monolith app. When life gives you a big ball of mud, write a big ball of tests.

Help us caption & translate this video!

http://amara.org/v/FG3m/

Big Ruby 2014

00:00:20.720 Hello, hello, hello there! I'm going to test my mic, which is apparently testable.
00:00:27.000 Before I get started, it was mentioned earlier this morning about Jim. I wanted to take a moment to say that I was lucky enough to meet Jim a number of times. One of my favorite testing talks of all time comes from Jim, where he built Rake from scratch on stage using a test-driven approach.
00:00:40.760 I was a big fan of testing, and I think he would have enjoyed my talk on testing the untestable. If you don't know me, my name is Richard Schneeman, or as I'm known online, Schnees. I am literally married to Ruby; her name is Ruby.
00:00:52.760 Something interesting has happened recently: she has become a Python developer, while I remain a Ruby developer. It’s somewhat interesting. It’s okay though; she knows how to program in Ruby, and she often says, 'If only this thing were written in Rails.' As you can tell by the hat and sandals, I'm from Austin, just three hours down the road. It's a great city, and I highly recommend you visit.
00:01:19.080 There’s a bit of a rivalry between Austin and Dallas, mostly because Austin has the cool music scene and Dallas has the biggest airport known to mankind. I was looking for something that I could use to zing them, so I Googled and found... notice that I am actually in incognito mode; this is not a personalized result.
00:01:40.360 A little bit about my backstory: I graduated from Georgia Tech, where I studied mechanical engineering. I'm not saying that I was a very self-centered person who studied only myself, but I actually don't have a computer science degree. This might mean I think a little differently from some of you in the room.
00:01:54.560 One of the things I loved studying in mechanical engineering was thermodynamics. Does anyone here like thermodynamics? It’s an amazing field! One thought I had while taking thermodynamics tests was, 'Okay, we have an answer key.' If I screw something up, it’s fine; I can just check the answer key. I can go back and make corrections. There’s always a way to check your work. Have you ever been doing homework and flipped the sign on something? There’s a significant difference between those two numbers.
00:02:14.560 In school, it seems trivial, but in the real world, there’s no answer key. There are real consequences. With this in mind, I went into my first Co-op job at a company you may have heard of called General Electric, where I helped build refrigerators. This was before my time, but I’m sure it still runs.
00:02:41.320 Does anyone know how you would begin to design a refrigerator? Math! That sounds good! I’m here to tell you that this multi-billion-dollar corporation decided the best tool for the job was spreadsheets. Some of the smartest mechanical engineers in the world come together and input all these dimensions, and then it spits out efficiency numbers and predictions on how the refrigerator will behave.
00:03:05.680 But what if these spreadsheets are wrong? What if you entered something incorrectly, or what if the calculations are off? This thought led me to consider testing. Believe it or not, I realized that they would calculate all these factors and then actually wire up the refrigerators, putting them in a room with a known temperature to see how they performed. Did it match up?
00:03:39.920 In many ways, programmers are lucky. We are working with programs, which means we have known inputs and outputs. The product is a program, and we can directly test it. Imagine if we could test a refrigerator with another refrigerator by putting mini refrigerators inside of the larger one. I don’t even know how that would work!
00:04:02.240 Unfortunately, other products aren't so lucky. A prime example is the commitment we made in the 1960s to go to the Moon. As many of you may know, outer space is a lifeless vacuum. In this scenario, you’re not worrying if your bacon will spoil or if your butter will melt; you have real people's lives on the line. How do we ensure that all of our calculations are correct? In this kind of scenario, you really have to take care, and it’s almost an untestable situation.
00:04:47.200 They came up with the idea to take all of these individual components, connect them to stands, and test them in real-time. For example, when they had a rocket engine slated for a cargo hold, they strapped it onto a real stand, lit it up, and ensured that it behaved as expected. This was a good start, but at the end of the day, we still didn’t know if everything was assembled correctly.
00:05:11.760 Ultimately, you have to launch it; you have to send something into space. This act is similar to an integration test. Bringing this back to software, I work for a company called Heroku. Is anyone familiar with Heroku? I mention this because sometimes when I talk to Java programmers, they aren’t aware of it.
00:05:30.919 At Heroku, I am part of the Ruby task force. I will quickly ask if anyone can name the year when 'No Country for Old Men' was released. Superbad, one of my favorites, was released in 2007, the same year that Heroku itself officially started operation.
00:05:50.320 The software I now work on first got its roots during this time. It was introduced in 2007 and is now known as a build pack. You might wonder, what exactly is a build pack? Whenever you use Heroku, you might do something like 'git push heroku master,' which sends your code to our system, and it triggers the Ruby build pack if you have a Gemfile.
00:06:07.679 All the processes that fly across your screen—from bundle install to asset compilation—are part of that process. It sounds simple on the surface, but the reality is very different. There are edge cases on top of edge cases that we support. As of January 2013, there were exactly zero tests. Not like plus or minus—I’m pretty sure this is an accurate number.
00:06:29.600 Before everyone freaks out, the code was manually tested to some degree. I got involved in writing tests a couple of years ago primarily because we had a lot of MVPs—minimum viable patches. And in this context, MVP refers to the smallest piece of code that will allow your program to work.
00:06:50.850 When you have a bug, issue, or feature request, you want to implement the smallest amount of code necessary to fix the issue. The goal is to introduce the least amount of code to minimize the chances of breaking something else. However, this leads to code that is not maintainable or flexible, and while MVPs seem great at first, they eventually degrade into something that's more difficult to manage.
00:07:10.440 The cure for the MVP problem is actually testing and refactoring. Has anybody come across the book 'Working with Legacy Code' by Michael Feathers? It’s a fantastic read, although it might make you drowsy! The book discusses black-box testing, which is used not just in programming but varies throughout life. You have a set of known inputs and outputs.
00:07:34.520 You define, 'If I put this in, I should receive this output.' As long as the input produces the correct output, you don’t really care how it works. So, how do we actually go about testing this build pack? It’s not as straightforward as with a Rails framework where built-in functions and fixtures are already available.
00:08:05.340 We decided to take real Ruby apps and deploy them to Heroku rather than faking everything and running them locally. This act of deploying a real application is akin to launching it into space; it allows us to verify our program and gain confidence.
00:08:23.600 To facilitate this, I created a small framework called Hatchet because it 'hacks' together tests for Heroku applications. The process involves cloning a repository, creating a Heroku app, and deploying it. Our Git repository serves as our known input, while we can also set configuration variables and other features. Our output will be the deploy log.
00:08:43.680 This log is what you see when you push to Heroku. When you run 'Heroku run,' you actually enter a shell session inside of a new dyno, and you can check if a generated file exists or if a command runs correctly.
00:09:06.920 Previously, I had a Rails 3 app. We maintain a repository on GitHub called Sharpstone, with 47 different repositories cataloging various edge cases. The naming was a play on 'sharpening a hatchet,' ultimately leading me to the name. We list all the dependencies required for the applications.
00:09:28.880 We designate that you can’t intermingle dependencies; it must be a complete Git repository for a successful process. This ensures that each repository configuration is respected. When we run something called 'Hatchet install,' it installs everything in parallel, and I'll introduce you to a gem I created called 'threaded' that's useful for handling threads.
00:09:53.260 This process retrieves the repositories from GitHub, which must be publicly accessible so we can run our tests on Travis. It clones them into a temporary directory, creates a new app via the Heroku API, and deploys it in real-time. We conduct assertions in the deploy block and gather our input and output for testing.
00:10:11.480 Some might think this approach may not seem riveting, and that's okay; perhaps you're even dozing off. However, it’s important to note that if you were just writing regular tests, this is where the discussion would wrap up. But because we're writing untestable tests, we communicate with S3, Ruby gems, and the Heroku API.
00:10:34.480 We also interact with a local network and GitHub. When any of these services go down, even if your code is correctly written, your tests may fail, which is incredibly frustrating. You feel confident that you’ve done everything right, yet it doesn’t work due to an external factor.
00:10:55.240 In light of this, you might seize the opportunity to have your code retry. I created a utility that allows us to retry our deployments, and it's very simplified, just a few lines of code. You can set an environment variable called Hatchet retries, and it will automatically try to deploy x number of times.
00:11:14.720 This experience led me to collaborate with Bundler, particularly after encountering long-running bundle installs that would fail due to network glitches. Upgrading to Bundler version 1.5 automatically retries network requests, which minimizes the headache.
00:11:38.720 As part of our development, we ensure all deploys are idempotent. If a deploy is executed and works, you could run it again, and it would tell you that everything is already up to date. It won’t fail a build unless there’s an actual issue.
00:12:02.040 But what if our local network hiccups while trying to interact with Heroku? You can still run into similar issues, which is why there's a great ecosystem in RSpec that allows you to retry test failures. Before working on this project, I never understood why you might want to do that.
00:12:22.960 When a failure occurs, the entire set of tests reruns, tearing down the app and deploying a new one. If you see a few failures in a row, there's a strong likelihood that your actual code is causing the issue.
00:12:43.760 Test failures typically take much longer to run than passing tests. When your tests are green, it’s super fast. But when your CI server hasn’t responded in 30 minutes, it’s usually because every single test has rerun several times.
00:13:04.960 In this way, we have created a non-deterministic situation. You could run the same exact code multiple times, and it may either pass or fail. To combat this, we approximate determinism through probability, understanding that while previous failures may not signify real issues, they should still be examined.
00:13:27.480 You may notice I’ve been using RSpec throughout. Just a heads-up, while I have my opinions, they aren’t necessarily those of the management. RSpec has an amazing functionality, including running single tests, but it has drawbacks if accidentally checked in, limiting your CI to run only that one test.
00:13:49.200 In RSpec, we can nest contexts. Moreover, the plugin system in RSpec is rich, allowing you to incorporate features like RSpec Retry. I feel that RSpec has improved testing readability dramatically; the syntax is cleaner than in many traditional test frameworks.
00:14:09.440 To illustrate, instead of traditional assertion methods, RSpec lets you use the 'expect' syntax. When communicating with new programmers, RSpec is much clearer than older methods like assert_match.
00:14:30.040 With this understanding, now let’s recap our input and output from our testing process. Initially, running these tests took about five minutes. With extensive testing on Travis, we are able to execute around 44 test cases in about 12 minutes.
00:14:57.480 This represents an incredible achievement for my team—far more efficient than my previous experiences where Rails test suites would take upwards of 30 or 45 minutes. While tests take longer when they fail, it’s crucial to identify and address issues when they arise.
00:15:19.080 A significant key to this success was the implementation of parallelism. In our workflow, we simply tell Heroku to deploy, and we wait for the process to finish. Using the parallel test runner enhances speed, allowing efficient processing.
00:15:40.960 After multiple rounds of tests, our build pack improved significantly—we recorded a 40% gain in speed. This was not mere happenstance; we meticulously engineered our processes to deliver these enhancements in efficiency.
00:16:03.840 Now that we have our big black box network requests in place, we can create smaller, modular tests. If there are aspects of the code that don’t require network calls, we can unit-test those individual components.
00:16:25.760 This leads to significantly faster tests; rather than a five-minute process, we now complete unit tests in approximately 1.63 seconds. Speedier tests result in quicker iterations and a smoother development experience.
00:16:49.960 In addition to working on the build pack, I contribute to another project called Codtri.com—a Rails application that just upgraded to version 4.0.3 due to some vulnerabilities. It allows users to register their interest in contributing to open-source projects specifically for Rails.
00:17:15.840 Crucially, our tests run on Travis, ensuring that if the Travis tests are passing, our application is deemed good to go. Following successful tests, it auto-deploys to Heroku—no manual effort necessary.
00:17:40.080 As a point of interest, we use network dependencies from GitHub, which necessitated some black-box testing. It entailed interacting with the user interface, essentially clicking buttons and serving pages. We utilized a tool called Capybara to facilitate this.
00:18:03.560 When considering network dependencies, it's essential to mock or stub calls to eliminate the unpredictability that comes with network interactions. It's critical if a request is non-deterministic.
00:18:27.480 While you can use small tools like WebMock for minor cases, I recommend the more robust tool called VCR. VCR allows you to record your network interactions and playback those recordings during your tests, stored in a YAML file, referred to as a cassette.
00:18:50.340 That was a lot of information to process, and I acknowledge that I’ve crossed the time limit. The key takeaway is that nothing is truly untestable. If something seems too vast or overwhelming, start with integration tests.
00:19:10.960 Even if you can only test specific components, assess the areas that could cause significant damage. I once worked for a company called Goala—an enormous social network that faced a critical failure for three days due to a broken sign-up process.
00:19:29.160 After that incident, I can assure you that a test for the sign-up process was implemented. Things happen; it’s unfortunate but manageable. These tests enable you to refactor your code, write smaller and faster tests, and even adopt a modular approach.
00:19:48.420 Feel free to distribute your tests into Ruby gems or adopt a service-oriented architecture model—whatever best suits your requirements. The nice thing is that with robust testing, you can be confident that if your tests are passing, your overall system will likely perform well.
00:20:10.560 If not, you will receive a notice of failure, enabling you to refine further tests and enhance confidence before your next deployment. Avoid implementing minimum viable patches; focus on maintaining flexibility, speed, and efficiency in your tests.
00:20:37.840 My name is Schnees. I've developed some gems, including Wicked, and I've co-authored a book titled 'Heroku: Up and Running' with Neil Middleton. You should definitely purchase it. Does anyone have any questions? I’ll be in the lobby during the next break.
Explore all talks recorded at Big Ruby 2014
+14