Sensible Testing

Most Ruby programmers spend significant time writing, maintaining and troubleshooting automated tests. While recent discussions in the Ruby community have focused on whether we're writing too few or too many tests, this talk looks at how we can write "sensible" tests that allow our applications to deliver the most possible value with the least amount of development time and effort.

GoRuCo 2012

00:00:16.640 Thank you very much! I'm really lucky to be going after a lot of great talks today. I wanted to take a moment to reflect on those talks and say some positive things about them. Not only because there were great talks and I don't really have much to criticize, but also because I don't want to get tossed off of a yacht later in the afterparty. I did want to mention some of the discussions where Dr. Nick started off this morning, talking about how the tools that we're using often shape the concepts we have about our applications. Matt also spoke about hexagonal architecture and how that can be a pattern to help us structure our applications in a better and more sustainable way.

00:00:42.360 Frances did a great job discussing values in our community in the context of frontend and backend applications, and how we have a lot of core community values that hold us together. One of those values is related to testing. In the Ruby community, we have a commitment to spending a significant amount of time testing our applications. This is so different from a decade ago when I was working in a Perl programming shop; I’ll admit to doing that. At that time, I was at a company pushing millions of dollars in transactions through a system, and I actually got strange looks when I asked to see the test suite. It was just not something people were doing. In Ruby, we've made great strides in our testing practices. However, I feel like there's a lot more we can do, and that's really what this talk is about.

00:01:30.040 Today’s discussions have focused on our values and the way we have built a lot of great tools, but also on how we can adapt those tools and how much farther we can go. We've talked a lot about testing recently and specifically about concepts related to test coverage. Questions arise about how much we should test and whether we should cover every aspect or not. In recent years, we've discussed concepts like TFT, which stands for 'Test All the Effing Time.' DHH, the creator of Ruby on Rails, has been vocal about a certain level of test coverage, and people like Kent Beck have also expressed their comfort levels regarding testing.

00:02:24.680 TFT has merit, especially when coming from a context where testing is not a prevalent practice. However, it has limitations. For instance, some developers at Stack Builders, where I'm a consultant, mention they have a code-to-test ratio of 1 to 2.2, yet they still lack confidence in our test coverage. This is disappointing, as we spend considerable resources, yet we’re not getting the development tools we need. DHH has shared his perspective in a blog post titled 'Testing Like the TSA'—don't aim for 100% test coverage. He states that code-to-test ratios above 1 to 2 indicate there might be a problem, and over 1 to 3 is a bigger concern.

00:03:28.239 Kent Beck, a major advocate of test-driven development (TDD), shares that he gets paid for code that works, not for tests. His philosophy is to test as little as possible to achieve an acceptable level of confidence, which he believes is significantly higher compared to industry standards—though he acknowledges this could also be hubris. This shifts us away from a quantitative approach and brings in subjectivity because what’s appropriate for one developer might not be for another. I'm still convinced there's more we can achieve than just discussing coverage. Thus, I want to focus on different patterns and techniques we can leverage to mold our test suite into something more manageable.

00:05:02.360 We have plenty of principles for discussing production code shapes: DRY (Don't Repeat Yourself), SOLID principles, and various design patterns. Designers like DHH argue that design patterns can lead to pollution in an application if overused, but they're still valuable in delineating beneficial shapes for applications or identifying flaws in hard-to-manage ones. Despite these discussions surrounding testing, much of our conversation remains at a quantitative level. I want to look beyond simple coverage.”

00:05:44.520 We need to shift our thinking and embrace concepts that enhance our applications' habitability. Habitability, a term described in Richard Gabriel's book 'Patterns of Software,' indicates the ease of living with and modifying existing code. It's not just about production code but should also apply to our test coverage. I often observe scenarios where a codebase may have high test coverage but low comfort, or very sparse coverage with little confidence as well. I aim to develop concepts that will progress how we organize tests, making them more user-friendly.

00:06:59.485 Recognizing that the concepts we have around test coverage shape how they are written is essential. Earlier, Dr. Nick mentioned that the tools we use shape our thinking, but we must remember that our concepts also influence the tools we create. Each tool reflects certain design concepts that shape their practical use and how developers conceptualize their codebase. Within software development, we often prioritize test coverage so much that it dictates our testing approach. We can extend this philosophy, presenting an idea surrounding testing that I call 'Cupid,' suggesting we should show our tests some love.

00:08:01.360 Following the tradition of principles like SOLID, I aim to create an acronym with concepts not entirely of my own, but useful for establishing sensible tests. I'll introduce the acronym now, followed by illustrations. The first part is C, which stands for 'Consistent Distance.' In our test suites, we have acceptance tests that are often end-to-end and integrative, where we want to ensure our system behaves as intended. We should aim to interact with the system as a user. Conversely, unit tests strive to isolate a specific module or class. C, in this case, means we should avoid stubbing in acceptance tests and instead focus on real interactions.

00:09:35.040 This notion of maintaining a consistent distance means, for acceptance tests, you shouldn't stub the underlying components; interactions should happen just as a user would. Acceptance tests shouldn't be stubbing out methods you don’t own. Adopting mock objects instead of stubs can clarify relationships between classes. This is linked to the idea that our tests should be pyramidal in form, similar to the test pyramid we often talk about, which illustrates the proportionate relationship between different levels of tests.

00:11:37.260 The foundation of our pyramid comprises unit tests, supporting a smaller number of integration tests and even fewer acceptance tests at the top. Many project teams are getting this wrong and end up with a higher proportion of acceptance tests, which can run slowly. I've encountered people who report their test suites take two hours or more to execute. This extensive duration often obscures the accountability required when a test fails, making it difficult to troubleshoot the root cause.

00:12:32.360 It's usually better to write unit tests than reliance on acceptance tests because unit tests run faster and provide clearer feedback for development. Additionally, tests need to be item potent, which means they should yield the same outcome irrespective of execution order. I’ve noticed this in many projects, especially older ones where developers may start to overwrite shared state inadvertently.

00:13:56.120 Immutability should be prioritized within test suites to reduce mutation side effects. Other programming languages emphasize immutability and require heightened clarity on variable state changes, but Ruby allows more flexibility, placing additional responsibility on developers. Another common issue is overlapping coverage, where a single bug might cause multiple tests to fail, resulting in frustration and confusion. The ambivalence creates a hostile development environment rather than a supportive one, which can deter developers from modifying code.

00:16:32.880 Learning from past experiences is crucial, and I endorse the idea of reflecting regularly on how our testing practices work and adjusting strategies accordingly. The Ruby community is vocal about the importance of strong testing practices, so we can be proud of our progress but can still find areas to improve. A key suggestion is to explore different concepts rather than be strictly bound to coverage.

00:17:58.520 In closing, I believe sensible testing is a reflexive process. In software, we build tools for clients, web applications for full-time employers, and also tools for our development practices. We craft concepts as frameworks to better our techniques. The goal is not to set these as inflexible rules but to adapt them for problem-solving. I invite you to consider what ideas or concepts you employ in your testing. Are they effective systems that promote development, or can they be improved? Thank you!