It's not your test framework, it's you

Talks

Robbie Clutton

2 talks

Matt Parker

2 talks

#behavior-driven-development-bdd

#testing

#capybara

#cucumber

#test-driven-development-tdd

#integration-testing

#agile-practices

It's not your test framework, it's you

by Robbie Clutton and Matt Parker

In the talk "It's not your test framework, it's you" presented at LA RubyConf 2013 by Robbie Clutton and Matt Parker, the focus is on sustainable practices in Behavior Driven Development (BDD). The speakers highlight the backlash against popular BDD frameworks like Cucumber, attributing issues with brittle test suites and slow build times to the misuse of these frameworks rather than to the tools or methodologies themselves. They advocate for a deeper understanding of BDD's intentions and emphasize that poor practices lead to complications in testing. Key points discussed include:

Hype Cycle of BDD: The initial excitement around BDD saw aggressive adoption, followed by criticism and a decline in practice as developers encountered challenges.
Common Issues: Problems such as brittle tests, insufficient acceptance criteria, and slow performance plague many implementations.
Cargo Cult: Developers often imitate methodologies without fully understanding them, leading to ineffective testing.
Importance of Conversations: Engaging product managers in detailed discussions about feature specifications is critical to success. This approach aligns development efforts with user needs.
Tools Evolution: The transition from early BDD frameworks to more sophisticated tools like Cucumber and features like Gherkin illustrates the need for a shared understanding and language between developers and product owners.
Reducing Brittle Tests: Tests should focus on what the application intends to do rather than how it operates, which reduces dependencies on the UI and enhances stability.
Living Documentation: BDD aims to create clear documentation that's meaningful and helpful throughout development.
Performance Optimization: To ensure efficient testing, teams should regularly review and optimize their test suites to prevent long execution times and improve CI workflows.

The presenters conclude by stressing the importance of treating tests as essential components of development and inviting developers to approach BDD with a commitment to continuous improvement and active engagement with product owners. By focusing on how tests reveal application intent and by employing best practices, teams can navigate the complexities and challenges of BDD more effectively, leading to better software quality and user satisfaction.

00:00:23.670 Hi, I'm Matt. Hi, I'm Robbie. We want to talk to you today about sustainable BDD practices.

00:00:30.340 This talk was originally titled, 'It's not your test framework, it's you.' However, we thought that made us sound like jerks, implying that we were blaming you for something. And while we are, in fact, blaming you, we didn't want to come off as jerks right from the start.

00:00:41.739 We've noticed a distinct hype cycle with BDD. Initially, there was an aggressive adoption curve. Many people were enthusiastic and excited, and numerous tools emerged. However, we began to see a decline as people started to criticize it, questioning the necessity of automated acceptance testing.

00:01:06.189 This leads us to a journey of reflection, where we want to discuss some points raised by those who have stopped practicing BDD. The real question is, what is going wrong, and why are we feeling this pain?

00:01:16.869 We see several common issues: brittle tests, poorly written acceptance criteria, unreadable tasks, flickering tests, and slow tests. However, before diving into these problems, we need to ensure we are all on the same page about what we're discussing.

00:01:36.700 When I first started doing BDD, I had no idea what I was doing. I was merely reading blog posts, learning about tools, and trying things out without fully grasping the deeper intentions behind BDD. This brings to mind the concept of 'cargo cult.' If you're unfamiliar with that term, I encourage you to check it out on Wikipedia—it's quite an interesting read. Essentially, we refer to 'cargo culting' when someone adopts or mimics a methodology, process, or technology without truly understanding it.

00:02:06.399 You have introduced it into your development environment without asking the right questions or doing it the right way. Before we delve deeper, let’s take a step back to reflect on how we got into this mess in the first place.

00:02:24.970 The journey begins with unit testing, particularly within the Smalltalk community, which began developing unit testing practices. The primary model was XUnit, which eventually evolved into JUnit, NUnit, and many other testing frameworks. These practices were structured to support various layers of testing, including real unit tests, integration tests, and some very basic automated tests.

00:02:50.170 It was Dan North who coined the term 'Behavior Driven Development' (BDD). He proposed, 'What if we flipped this on its head?' At that time, we were employing an inside-out approach to test-driven development, yet we had practices—such as Specification by Example—that were developing outside that orbit.

00:03:18.740 Our tooling wasn't aligning with this evolving process. Let's walk through an example. Imagine I’m a product manager and I say, 'We're going to build this web app, and we need the sign-in feature. Just make it happen.' Instead, we should respond with, 'Please, give me an example.' This might require some back and forth with your product manager, fostering a clearer understanding of the feature.

00:03:54.410 The example could be: 'A user opens the app, inputs their username, password, and email address, and they’re signed in immediately.' But we should aim to expand this scenario further. What happens if the input is invalid? What responses do we want to provide? This includes scenarios such as blank fields, mismatching password confirmations, and many more.

00:04:24.550 When discussing password security, we could ask questions like, 'How strong should the password be?' or 'Are we dealing with sensitive information?' These inquiries touch on the balance between security and convenience. Unless you initiate these discussions, assumptions regarding the product owner’s priorities will remain unchallenged.

00:04:47.739 Similarly, we need to consider how we handle invalid email addresses. Does the system bounce back undeliverable emails? Each of these conversations is crucial for understanding and refining the actual needs of the product.

00:05:21.270 What if the username is unavailable? It can be a reserved word or someone might forget that they have already signed up. In this case, we should discuss how a user might recover their account if they forget their login details.

00:05:48.270 After this extensive discussion, we might start with a seemingly simple feature request—integrating 'devise' for authentication. However, that could expand into four different features and 17 detailed scenarios. Agile practitioners like Dan North uncovered hidden complexities, enabling software developers to understand better what product owners want.

00:06:29.590 Dan North developed 'JBehave,' the first BDD tool that facilitated discussions about business value using terms like 'features' and 'scenarios,' allowing developers and product owners to share a common language. This helped bridge the gap between intended functionalities and actual implementations.

00:06:56.400 In a similar vein, 'Cucumber' followed, introducing 'gherkin,' a language specification that helped articulate how features of an application should work. Gherkin helps anyone—especially product owners—understand the state transitions of an application without requiring advanced technical knowledge.

00:07:43.720 Gherkin also encourages everyone involved to grasp the complexity of the application. If you're trying to hide that complexity from the product owner and your users, it will only come back to haunt you. A shared language like Gherkin aids in productive conversations around simplifying and refining application logic.

00:08:14.079 With a unified language established, integrating project management tools and code editors becomes seamless. Developers can directly translate business discussions into actionable tests, thus aligning their work with the overall product vision. The ultimate goal of BDD and TDD is to produce living documentation that serves as both a guide and a guarantee of functionality at any given moment.

00:08:50.360 As we moved from Cucumber to other frameworks like Spinach and others, we began to realize that we were often not asking the right questions about the applications. Instead of blaming the tools when test suites became frustrating, we need to dig deeper and address the underlying issues.

00:09:37.660 Addressing the root of the pain we felt while doing BDD was imperative. For instance, one common pain point is brittleness in tests. When changes are made at the product level, it often leads to multiple tests breaking, indicating a brittle suite that fails to reflect the application's intent.

00:09:54.400 Brittle tests often focus on how things work rather than what they are intended to do. Let's consider an example where we're building a feature for Twitter at the request of a product owner: 'I want the tweet feature.' You might think about scenarios where a tweet is valid, too long, or a duplicate.

00:10:19.250 However, simply testing that the UI performs correctly by walking through every interaction can lead to problems. These tests are verbose, susceptible to changes in the UI, and instead of focusing on business objectives, they recount the steps taken rather than what the application achieves.

00:11:08.052 You might find that logging in is duplicated across various scenarios, scattering your assumptions and knowledge across your codebase. In doing so, you may develop a test suite that is tightly coupled to the UI, creating a headache when changes arise.

00:11:46.220 If you realize that the issue isn't with the framework, but with how you’re writing your tests, it's critical to adjust. Consider the cautionary tale of 'big rats leave big patches,’ where the way we are utilizing tools like Capybara can introduce unnecessary complexity.

00:12:05.940 Imagine building a Twitter feature, which includes clicking a tweet button. As an example, if your tests are dependent on Capybara, writing raw steps in your specs can create brittle tests, particularly if different developers implement them differently.

00:12:44.840 If one developer writes tests that target UI elements directly, while another uses abstract layers, when it's time to modify something significant, you’ll end up patching numerous tests one by one. Instead, it’s more effective to abstract the repeated logic into modules.

00:12:59.450 By creating a module that handles the tweet functionality and makes use of helper methods, you can encapsulate that knowledge in one place, ensuring consistency in the tests while also reducing brittle dependencies on the UI.

00:13:40.320 At this stage, you're beginning to foster the development of a domain-specific language (DSL) focused on your application's core functionality, rather than on its interface. Both Cucumber and RSpec provide the flexibility to use this pattern, allowing developers to create abstractions that funnel into cleaner and more understandable tests.

00:14:40.970 Inadequate collaboration with product owners while defining acceptance criteria is another aspect where BDD can falter. Often, teams may have cargo-culted BDD, missing the collaborative intent it was founded on.

00:15:15.580 It can be tricky to convince product owners to adapt their perspective on testing. If you start imposing a new test framework that requires them to change their story-writing behavior, they may feel overwhelmed and annoyed.

00:15:41.729 It's crucial to foster an environment of collaboration where the focus is on building helper methods to make it easier for the product owner to express their vision clearly instead of shifting the burden onto them.

00:16:03.780 A common theme we've observed with Cucumber and BDD is failing to deliver on the promise of executable documentation. Many tests become unreadable due to poor design or because they're not readily accessible.

00:16:44.009 If tests are hard to read, you’re not likely to return to them, even for clarification seeking clarity on product requirements. Tidy documentation is essential regardless of whether or not it’s directly used; and tools like Cucumber can produce output in HTML formats for easy sharing.

00:17:09.490 For those of you using RSpec, you may have noticed the recent emergence of tools like ReLish, which help in presenting documentation styles. Alternatively, the BBC offers Wally, an open-source variant.

00:17:52.290 Another reason tests can become meaningless as documentation is if they’re simply unnecessary to read. If you’re practicing BDD with a product owner, writing stories together, you’ll still benefit even if you’re not directly leveraging a BDD tool.

00:18:35.240 However, using tools that support Gherkin generally maximizes effectiveness. The expectation is that living documentation continuously evolves, but any documentation that isn't actively consulted is not a pressing issue.

00:19:01.390 Up until now, we’ve reviewed painful challenges that surface while navigating BDD, including legible documentation and clarity in test logic. The real, persistent issue is the performance of tests—these can often slow things down.

00:19:53.960 For instance, have you encountered test suites that take an excessive amount of time to execute? The inefficiencies could stem from a poorly configured suite burdened with unnecessary tests.

00:20:37.190 In a project I worked on, when we inherited the legacy codebase, we eventually got the tests running, only to discover that they took 33 hours to complete with over a thousand tests failing. This cautionary tale illustrates that excessive test execution time diminishes the value of those tests.

00:21:49.860 It’s critical to identify tests that inherently produce long feedback loops, hindering your confidence in the framework. Slow tests can lead to a disconnect, whereby you lose touch with your continuous integration environment and ultimately your development process.

00:22:42.060 One common pitfall developers frequently make is inserting sleeps into their tests. This implies a surrender to the complexities of timing and can lead to unreliable test results.

00:23:06.220 Aligning with the practices within Capybara helps to mitigate these issues. By implementing effective waiting conditions rather than arbitrary pauses, you’re fostering better test reliability.

00:23:43.730 Should you still experience flickering tests, consider quarantining them with designated tags. This allows your CI to run critical tests while separating those that are less stable, giving you insight on areas needing attention.

00:24:20.340 Designate a role on your team, aptly termed 'build nanny', to keep track of these flaky tests. This person should investigate and propose re-writes or simply suggest deleting tests that no longer serve a purpose.

00:25:12.610 Adam Milligan, a senior engineer at Pivotal, eloquently reminds us that we shouldn’t fear deleting tests that no longer deliver value. We sometimes treat tests as sacred entities—permanent without regard for their relevance or utility.

00:25:47.890 As we remove inefficient tests, our test suite's reliability and speed should increase. Ultimately, we’re striving towards a maintainable suite that provides confidence rather than a burden.

00:26:26.960 At this point, reevaluate your suite. Ask yourself why authentication is being tested repetitively. In many cases, you might not need such extensive checks.

00:26:50.220 Consider integrating journey testing with functional testing—focusing your acceptance tests on high-level user interactions while keeping the core underlying code much more accessible to testing.

00:27:36.540 As your implementation improves, utilize tags effectively. Distinguish between tests that need to be run locally versus those that can remain solely within the CI environment. Strive for a balance where some tests run for confidence while others may be tagged to indicate lesser relevance.

00:28:26.590 Finally, don’t mistake BDD for acceptance tests that must always drive through the UI. Many features can be validated without exhausting the interface. Using domain helpers to build tests outside of the UI provides quicker insights with greater reliability.

00:28:55.260 As you refine this process, your tests should gradually become faster and easier to execute. When your suite is efficiently running in under a minute, you’re in a great position to keep pushing your work forward.

00:29:39.200 To conclude, remember these key takeaways: Identify where the pain points arise, treat your tests as equally important as your production code, and approach BDD with the commitment it deserves. Do it wholeheartedly, and you’ll likely find yourself much happier with the results.

LA RubyConf 2013