Cucumbers Have Layers: A Love Story

by Sam Livingston-Gray

In the talk 'Cucumbers Have Layers: A Love Story' at RubyConf 2015, Sam Livingston-Gray presents a new perspective on using Cucumber, a tool for behavior-driven development, primarily through the Gherkin language. This presentation is designed for individuals who have previously used Cucumber but may have unresolved issues with it, and offers insights into how to utilize Cucumber effectively.

Key Points:
- Introduction to Gherkin and Cucumber:
- Gherkin is a domain-specific language (DSL) that allows users to describe software features in a human-readable format. Gherkin files contain scenarios that consist of steps defined in natural language, facilitating both documentation and automation.
- Cucumber processes these Gherkin files to automate testing, helping in bridging communication between developers and non-technical stakeholders.

Cucumber's Role in Testing Workflows:
- Livingston-Gray advocates for using Cucumber as a guide rail around test-driven development (TDD). He explains that starting with Cucumber scenarios aids in focusing on what to do rather than getting lost in implementation details.
- A workflow is presented where failing Cucumber tests lead to necessary adaptations in unit tests, promoting an iterative development process.
Common Mistakes with Cucumber:
- He shares personal anecdotes regarding common pitfalls, such as excessive reliance on Cucumber for regression tests, lengthy test suites that dilute effectiveness, and cluttering scenarios with implementation details rather than focusing on user intent.
- The talk emphasizes the importance of keeping Gherkin focused on describing the domain rather than the user interface.
Real-World Application Example:
- He describes a complex project involving sales commission calculations where Gherkin was used to document and clarify business rules, facilitating better communication between developers and the sales department. This project highlighted the value of proper Gherkin documentation.
Gherkin as a Communication Tool:
- Livingston-Gray stresses that Gherkin can foster discussions among team members, serving as a collaboration tool rather than merely a test automation framework.
- Emphasizes the importance of using clear and contextual language in Gherkin files to support understanding across different stakeholders.

Conclusions:
- Gherkin and Cucumber can be powerful tools when utilized correctly. Developers are encouraged to focus on crafting meaningful feature files that capture user intent, while maintaining simplicity in step definitions to prevent complex and fragile tests. The talk advocates giving Cucumber another chance, urging developers to rethink their approach and usage in behavior-driven development contexts.

00:00:11.990 I have two items of business to take care of before we talk about Cucumber. First, these slides and this talk are open source.

00:00:17.310 The slides I am using right now are available at tinyurl.com/cucumbershavelayers. If you have any trouble seeing the screen or following my train of thought, or if you like spoilers, you can download a PDF with both my slides and my script on your own device.

00:00:28.770 You can even download the slides now and walk out to see a different talk without me being offended. It's been really hard to choose from this program, which has been really great.

00:00:52.260 Secondly, I work for a company called Real Geeks, which provides web-based tools for real estate agents. I’m sure you will be shocked to learn that we are hiring.

00:01:11.670 I also want to mention that our office is in Hawaii, while I live in Portland, Oregon, and work from home. Here’s a photo from my last office visit—this is my daughter and me looking at the moon rising. My partner took this photo, and it’s her lock screen, which I consider the best perk of the job.

00:01:36.960 We have work available in Ruby, Python, and JavaScript, and we especially need some help with React.js. So if you're interested in writing software in Hawaii, come say hello! I also have stickers.

00:01:46.350 With that out of the way, welcome! I’d like to get a quick show of hands: How many of you have used Cucumber on any project, large or small? Wow, almost everybody! Cool. How many of you have used Cucumber more than once? Okay, cool.

00:02:15.000 Regardless of whether you’ve used it once or multiple times, how many of you would use it again? More than I thought! Okay, let me see some ‘maybes’ as well. This talk is aimed at people who might have used Cucumber in the past and decided not to use it again.

00:02:38.400 I’m glad to see that fewer of you are in that camp than I thought, but I hope to offer a different perspective that might convince you to give Cucumber another look. For those of you who haven't used Cucumber, you should definitely get the Cucumber book; it provides a much better introduction than I could give in a full 45 minutes of Cucumber 101.

00:03:01.230 Just so you're not completely lost, Cucumber lets you describe features of your software in a language called Gherkin. Gherkin is a domain-specific language (DSL) for writing acceptance tests.

00:03:36.270 Since this is a Ruby conference, and we tend to say DSL when we mean API, I need to clarify that when I say Gherkin is a DSL, I mean it is an actual domain-specific language with its own grammar and semantics. Gherkin is not Turing complete, but it can be used to instruct a Turing-equivalent machine on what to do.

00:04:13.560 As I was saying, Gherkin is a DSL for describing software. Each separate Gherkin file describes a feature, which has one or more scenarios. A scenario consists of one or more steps, such as 'given', 'when', 'then', which you see at the bottom left of the screen.

00:04:50.979 Aside from a few keywords highlighted here in green, everything else is written in whatever natural language works for you. Gherkin's grammar is quite simple; everything from a keyword to the end of the line is treated as a single token by the Gherkin parser.

00:05:01.659 Gherkin is quite useful just as documentation for your project, but of course, Cucumber also lets you use these feature files to automate tests. This is why the creators of Cucumber like to talk about executable specifications. To go from human-readable documents to running tests, you do need to write a bunch of step definitions.

00:05:45.000 A step definition is simply a regular expression plus a block of code. This is how you translate those human-friendly blobs of text into something that Ruby can actually execute.

00:06:07.370 When Cucumber wants to run a step, it tests that step against each of the regular expressions you’ve provided. When it finds a match, it executes the corresponding block. There’s also a mechanism for capturing from the regex and passing those captures as arguments to the blocks.

00:06:20.489 In my mind, Gherkin and Cucumber are almost two separate things. Gherkin gives you a human-friendly way to describe software, while Cucumber processes your Gherkin files and uses them as a script for automating tests.

00:07:00.600 I basically put up with Cucumber because I really like Gherkin. Gherkin allows you to describe your software domain using whatever language makes sense to you and your team. It has just enough structure that it can drive a lot of machinery for automating tests.

00:07:37.160 However, it’s important to understand that Gherkin is not a programming language. This distinction is crucial, as programming languages require us to get all details correct upfront, causing us to focus on how to do something. Gherkin, on the other hand, helps us think about what to do, why, and for whom.

00:08:11.840 I also want to point out that Cucumber is not strictly a tool for test-driven development (TDD). Cucumber and TDD complement each other nicely, but from my experience, they work on different rhythms.

00:08:49.000 If Cucumber serves as a set of guide rails for TDD, then my workflow looks like this: I start with a Cucumber scenario, run it, watch it fail, and check error messages to identify the cause.

00:09:06.000 I use that information to write a unit test, watch the unit test, and refactor. At this point, I have the option to write whatever I might need to, going back to the red-green-refactor cycle multiple times.

00:09:29.000 Whenever I get stuck, I revisit the Cucumber test, which is likely still failing but for a different reason, helping me figure out what to do next. This leads me back into the TDD cycle.

00:09:49.000 Eventually, I run the Cucumber test again, and when it passes, I feel accomplished and then move on to the next scenario.

00:10:02.480 This workflow allows me to spend most of my time in that tight inner loop of red-green-refactor, doing TDD. Having tests and code in the same language minimizes context-switching, enabling quick iterations in that loop.

00:10:25.300 This means that sometimes I can write a new test every minute when I am performing optimally. It’s a good, satisfying, detail-oriented workflow. However, if I start losing sight of the bigger picture, I shift back to Cucumber.

00:10:47.240 This shift from writing Ruby back to writing Gherkin reminds me to think about the what, why, and who, helping me determine my next steps.

00:11:14.670 Tom Stewart, who I wish were here this year, wrote something about Cucumber that resonated with me: he described it more as a mind hack than a testing tool, as it encourages thinking about the overall concept rather than the details.

00:11:28.520 When preparing this talk, I asked Matt Wynne, co-author of the Cucumber book, if there was anything he wanted people to know about Cucumber. He replied that he wished more people understood it as a thinking and collaboration tool, not just for test automation.

00:12:00.560 Both quotes lead back to something I mentioned earlier: Gherkin is not code, and Cucumber is not for TDD. However, negative definitions are not very useful; positive definitions are much more beneficial. So what are these things for?

00:12:26.900 I've talked about how I use Cucumber as guide rails around TDD, but let’s explore Gherkin a little more.

00:12:50.310 Everything I say in this talk, unless I’m quoting someone else, is based on my own experience. I definitely do not speak for the Cucumber team; this is just my perspective.

00:13:16.050 I believe Gherkin is for describing software at the level of user intent. You might choose to use Cucumber to turn your Gherkin artifacts into automated tests, but you aren't required to.

00:13:42.550 By describing software, I mean that Gherkin helps you capture acceptance criteria. In terms of user intent, Gherkin illustrates concepts broadly without getting bogged down in excessive details, which is the focus of TDD.

00:14:05.280 I've encountered scenarios that look like this, and every time I've tweaked my user interface, those steps tended to break, leading to hours of editing, which is not the best use of my time.

00:14:27.029 Lastly, just because Cucumber is pitched as a tool for automating tests doesn't mean you are obligated to use it that way. Personally, I think Cucumber's greatest value comes from Gherkin: using it as a tool for facilitating conversations between developers and stakeholders.

00:14:51.660 I’ve written Gherkin files, discarded them, and felt I spent my time well because it helped clarify the tasks that needed to be done.

00:15:15.320 This realization about how Cucumber should be used took me years to figure out, and I made a lot of mistakes along the way, some of which were painful.

00:15:33.030 In the hope that you can learn from my experiences, I’d like to share some of those mistakes with you. While this is by no means an exhaustive account, I will highlight some of the more interesting, entertaining, or educational ones.

00:16:15.260 For instance, I once helped write a Cucumber scenario in a real live codebase. I want to reiterate here that it is okay for you to laugh at my mistakes.

00:16:30.350 When we automated this, it would visit a route that was only defined in the Rails test environment, which rendered a static view that required a JavaScript file.

00:16:53.950 That JavaScript file contained all of the unit tests for our frontend helpers, and the number of tests grew from about 40 to 50 as we continually added more.

00:17:18.380 Every time we added tests, we had to go back and update the Cucumber scenario with a new number, and we realized we were likely headed for trouble.

00:17:41.540 In a separate project, we found ourselves creating a Cucumber scenario that filled out a form with randomly generated data. After submitting, we checked whether the submitted data appeared on the show page.

00:18:04.300 After that, we’d change each value on the edit form and make sure the updates were visible before deleting the record and confirming it was gone from the list.

00:18:29.070 In this project, we wrote functionality that mutated values in the form fields, requiring us to add CSS classes to our markup to indicate the type of fields.

00:18:52.890 While we were developing our Cucumber suite, we avoided writing actual features that our customers cared about and, ultimately, we were fired.

00:19:19.370 When you receive a bug report, it's beneficial to create an automated test that reproduces the issue to prevent regressions.

00:19:39.320 However, it's generally not advisable to rely on Cucumber for these tests. Gherkin serves as an excellent way to tell your application's story; ideally, you should be able to hand off your feature files to someone new, and they should quickly grasp your software’s purpose.

00:20:05.760 Cluttering up your narrative with regression tests can make your documentation overwhelming, sometimes turning clear writing into a convoluted mess.

00:20:26.000 Another common blunder is having too many scenarios, which can lead to long test suites. Personally, I’m comfortable waiting five minutes for a test suite, but any longer dilutes their usefulness.

00:20:59.720 If you find yourself in this situation, consider tagging a critical subset of your tests to run before each commit, while allowing your CI server to run the whole suite once you push your code.

00:21:26.560 Another mistake to avoid is trying to automate every feature you write. It’s completely acceptable to use Gherkin for discussions, possibly even just with yourself.

00:21:53.210 You can discard the feature file once you’ve extracted what you needed. If you beneficially keep the file for record-keeping, tag it as FYI or TBD, and adjust the Cucumber configuration so that scenarios with that tag are never run.

00:22:13.390 There are many pitfalls regarding step definitions. A common piece of advice from the Cucumber book is to create helper functions and call them from your step definitions.

00:22:51.260 However, I started using Cucumber years before the Cucumber book was published, so I made mistakes, such as creating too many step definitions.

00:23:09.340 Some of my step definitions were excessively long, while others called other step definitions or incorporated logic—none of which is recommended.

00:23:37.840 After making all those mistakes—and more—I felt conflicted about Cucumber. On one hand, I loved Gherkin's expressiveness, and I believed programmers and managers could collaboratively write acceptance tests.

00:24:06.580 However, I struggled to reconcile that with my experience on a project filled with hundreds of scenarios and around 750 step definitions amounting to about 5,000 lines of code, taking about 90 minutes to run.

00:24:39.460 During my struggles, I began wondering how I would write scenarios if I was unsure what the user interface would be.

00:25:02.260 If a reader can deduce if they’re dealing with a web application, desktop application, or command line from the cucumber features, you might be including too many details.

00:25:27.300 This question lingered as I transitioned between projects. Eventually, I was brought in to address one part of a large monolithic Rails app responsible for calculating salesperson commissions.

00:25:55.860 Now, while you might think sales commissions are straightforward—adding up sales and multiplying—reality was more complex with various changing compensation schemes.

00:26:16.310 These schemes could drastically change a couple of times a year and were outlined in dense, confusing documents provided by the sales department.

00:26:41.050 One of my goals during this project was to describe these compensation schemes using Gherkin, facilitating conversations with the sales department and developers.

00:27:05.930 Let’s start with a simplified version of how one of these schemes worked. The company stated that if you sold $100,000 worth of widgets in a month, you would receive a bonus.

00:27:34.110 In this case, that target bonus was $100. There’s a scaling factor as well; miss your target and get paid less, exceed it and earn more.

00:27:57.999 However, there is a catch—known as the pay curve. The sales department provided a spreadsheet with examples for each possible input value, helping me understand how the pay curve functioned.

00:28:39.080 This curve was piecewise linear, which made it easier to conceptualize the layout for creating tests.

00:29:03.059 From this spreadsheet, I easily converted various rows into Gherkin tables for use in scenario outlines, which act as templates followed by execution for each row.

00:29:25.180 Since the pay curve was piecewise linear, I only needed to include a few examples around those critical boundary points to ensure accuracy.

00:29:47.800 Next, I described the compensation scheme itself, which initially appeared similar to the previous scenarios, but introduced new concepts.

00:30:11.690 These included the idea of compensation schemes, with sales bonuses and target bonuses in dollars rather than percentages, plus actual dollar amounts.

00:30:31.680 Once those concepts were set, I introduced the last feature: the safety net, designed to assist new hires as they adapted during their initial months.

00:30:49.360 Essentially, it ensured new hires would get their target bonus, cover the difference if they did not hit their sales target, thus aiding their adjustment.

00:31:08.760 My organization and writing of the Gherkin features for this project is intended to provide a clearer understanding for getting started.

00:31:30.300 I also want to emphasize an underutilized aspect of Gherkin’s grammar; it allows free-form text at the top of the file.

00:31:55.080 Many examples and tutorials show that area filled with templates, but I find people typically ignore providing specific context.

00:32:20.480 In this project, I used that space to provide context regarding this feature's purpose.

00:32:33.000 After handing off the project to other developers, they noted that the documentation, especially about the compensation schemes, was immensely helpful.

00:32:44.640 It's crucial to not overlook opportunities for communication when treating Gherkin as a programming language because this free-form section can be very useful.

00:33:07.500 Finally, I want to highlight that every word in these Gherkin features should reference the domain, not the interface so a reader learns about the organization rather than specifics about the software.

00:33:35.480 Now, regarding the architecture I settled on for my application, I chose Rails but, aimed for a more disciplined structure than usually found.

00:33:47.450 I organized the code into three main layers: Rails controllers and views for the user interface, and then connecting to Active Record objects and some service objects.

00:34:08.860 These interacted with plain old Ruby objects (POROs) that modeled the rules for the compensation schemes themselves.

00:34:30.630 Although somewhat unconventional for Rails, this structure is not groundbreaking; Bob Martin's 2011 talk on architecture has valuable insights.

00:34:55.670 While I personally found his talk difficult to watch, valuable information can still be gleaned, and I hope you explore the ideas he presented.

00:35:09.950 During my sales commission project, I came across an insightful presentation by Jim Wyrick about decoupling from Rails.

00:35:29.550 His concluding thoughts drew attention to the potential of integration testing, suggesting a framework where tests could be run at various levels.

00:35:48.600 This inspired me, and although I never had the opportunity to discuss this concept in detail with Jim, I am today sharing my insight.

00:36:06.550 I envisaged marking scenarios within Cucumber as being tested at various levels—user interface, model layer, or both.

00:36:24.780 Those tagged as UI would be run through Capybara to test the full Rails stack; those for the model would run directly against Active Record, allowing for faster execution.

00:36:46.060 After some testing, I quickly realized that handling the complexity of the core PORO layer alongside mapping to a relational data model was beyond my capabilities.

00:37:09.300 I decided to incorporate a core tag for that layer, allowing me to add the necessary step changes without additional complexity.

00:37:37.000 As I wrote my scenarios, I'd tag them with the layer I wanted to run them at, adding a 'work in progress' suffix and running tests until they passed.

00:38:01.960 Once the scenarios were passing, I would remove the suffix, then could reuse the scenario at a higher layer, tagging it similarly.

00:38:29.820 This method allowed me an effective transition through Gherkin, adjusting scenarios as needed up through each application layer.

00:38:55.880 I had to manipulate the load path while keeping track of step definitions, but it worked out reasonably well.

00:39:18.000 This approach let me consolidate step definitions effectively while isolating the complexity of each test layer.

00:39:34.920 I have a few observations regarding the project: primarily, step definitions are tricky—they essentially must exist and should function as lightweight adapters between Gherkin and your application.

00:40:04.050 The ideal step definition should only have one obvious line of code. However, I dislike step definitions due to their lack of a structured namespace.

00:40:34.360 As a result, I prefer to extract complex logic from step definitions into their own drivers, applying Ruby's object-oriented features for better organization.

00:40:54.930 This worked remarkably well for my project. I'd love to use Cucumber again if I find a fitting project, as I see value in using step drivers.

00:41:09.840 The dense compensation scheme documents we received made it challenging to comprehend. By distributing this logic across layers, I focused on core functionalities, which effective implementation.

00:41:35.880 Once I had the core working, I figured out how to adapt that logic into Active Record, finally addressing the interface.

00:42:00.680 Reflecting back, I believe this layered approach was the most effective way to deliver the application timely.

00:42:27.890 The developers that succeeded me weren't as excited about this structure, but after a time, they acknowledged its merits and found the documentation beneficial.

00:43:18.580 It's key to consider performance. Ryan Davis spoke at Cascadia RubyConf about test framework speed, and Cucumber consistently ranked last on his comparisons.

00:43:43.210 This left me wincing, especially since I dealt with a 90-minute test suite. However, I assumed I would incur a performance penalty, which I was fine with.

00:44:06.593 In this project, I had 64 scenarios tagged as core that ran in under a second at the model layer and 118 scenarios that completed within eight seconds.

00:44:33.239 At the user interface layer—with Capybara driving the application but without supportive JavaScript—I had 11 scenarios that completed in three seconds.

00:44:57.940 These reported times stem from Cucumber and exclude the time taken to load all Rails components for testing. I had a rake task to manage the suite's performance.

00:45:17.250 Overall, I ran that rake task in around 40 seconds, so while Cucumber may not run tests at lighting speed, I was pleased with the overall performance.

00:45:45.520 In conclusion, if you are considering giving Cucumber another chance after a bad experience, I suggest keeping two things in mind: first, that your features should describe your domain and not the UI.

00:46:09.600 Second, remember that ‘step definitions are lava’—it’s best to limit them to straightforward functions, enabling a streamlined process for effective testing.

00:46:39.680 I want to clear the stage for the next presenter, so I won't prolong the discussion for Q&A. However, if you have any questions, feedback, or if you’d like to grab more stickers, please feel free to chat with me after I pack up my things. Thank you!