RSpec: The Bad Parts

by Caleb Hearth

In the video titled "RSpec: The Bad Parts", Caleb Hearth addresses the complexities and pitfalls associated with certain RSpec constructs, specifically focusing on let, subject, before, after, and shared examples. With over a decade of experience in Ruby on Rails and a strong emphasis on test design, Caleb aims to challenge the conventional use of these features in favor of clearer, more maintainable test code.

Key Points Discussed:

Common Pitfalls of let and subject:
- These constructs can lead to increased complexity when deeply nested and used extensively, creating a convoluted web of dependencies that are difficult to interpret. Caleb provides a stark example of a 4500-line test suite using let excessively, showcasing how it resulted in obscured logic and hard-to-follow tests.
Redefinition Complexity:
- let and subject can have their definitions modified across different contexts or describe blocks, leading to confusion over which definition is current.
Challenges Posed by before and after:
- Similar issues arise with before, after, and around methods, which can complicate the tracking of what each test is doing, especially when they add to existing setup code rather than replace it.
DRY Principle Reevaluation:
- The talk questions the understanding of the DRY (Don’t Repeat Yourself) principle. Caleb argues that the goal should be to avoid duplication of knowledge rather than just code, emphasizing that sometimes, repeating setup code within individual tests can increase clarity and understanding.
Refactoring Recommendation:
- As a solution, he advocates for inlining definitions directly into tests to improve readability and understanding. This promotes clear self-documenting tests instead of relying on implicit assumptions created through excessive use of let and before blocks.
Example Refactoring:
- Utilizing a real-life scenario from the Mastodon tests, Caleb demonstrates refactoring a spec for the mute! method, simplifying it by inlining let methods into relevant tests and cleaning up excess context blocks that obfuscate the intent of the tests.

Conclusions and Takeaways:

RSpec can be powerful, but to harness its true benefits, one should limit the use of certain constructs that instigate confusion.
Favor simplicity and clarity in test setups over adhering rigidly to DRY principles. Write tests that are self-contained and understandable to minimize hidden complexity.
Emphasize the importance of each test as documentation that conveys how to reproduce system behavior rather than relying on shared methods or indirect constructs that obscure the true functionality being tested.

00:00:00 Ready for takeoff.

00:00:16 Thank you for coming. My name is Caleb Hearth, and while I usually prefer to keep my introductions brief at the beginning of talks, since you're already here, I want to touch on my background because it helps underline the point of my talk.

00:00:19 I've worked in Ruby on Rails for about 10 years. Over that time, I spent about five years each in product and consulting companies. I currently work at Test Double, a consultancy specializing in embedding within software teams to help build better software and better teams.

00:00:27 Previously, I worked at Thoughtbot across various Greenfield and established Rails apps. You can find me on my blog, where I write about various programming concepts such as ethics, best practices, and testing. Since the bird site is done now, you can find me on Mastodon at dice.camp/@caleb or just by searching "Caleb Hearth."

00:00:42 Throughout my career, I've had the opportunity to work in dozens of applications, each with its own test suites and strategies for testing. This has given me the chance to compare many different ways of writing tests. A little use of `let` and a tiny bit of `before` often seems like a good idea, but I have seen them grow organically into spider webs of complexity that are difficult not only for new team members to grasp but also for seasoned veterans to reason about.

00:01:01 The `let` construct defines a memoized helper method: give it a name and a block to execute, and it'll execute that block zero or one time depending on whether you call the method, store the result, and return it. On subsequent calls, it will return that stored result without executing the block again. When you look into the definition, it's almost exactly defined as a plain Ruby method. A `let` method can be referenced by other `let` methods, and depending on the nesting, that reference may be to a different method definition in different specs.

00:01:36 `Subject` can also reference `let` methods. Speaking of `subject`, it's basically just a `let` block with special semantics. Some methods, like `it` or `is_expected.to`, use the subject name implicitly. You can also give subjects different names while maintaining their semantics. Effectively, `subject` is just a `let` method named subject, so every time I say `let`, I also mean `subject`.

00:02:03 The `before` block is executed before each spec that it is in scope for. This means any `describe` or `context` block containing both the spec and the `before` block, regardless of how deeply nested it is, will execute the `before` block. I'm going to use `before` here to stand in for `after` as well since they behave exactly the same at different times, and the same goes for `around`.

00:02:48 A `before` block can reference `let` variables or any other methods that would be in scope for a test, including `subject` or other `let` methods. Now I apologize; I may have overestimated the contrast that a projector and bright lights can handle.

00:03:02 Let’s take a look at an extreme but real example highlighting why `let` and `subject` can be problematic. This is a 4500-line unit test for the god object of an app I once worked on. Before I started the project, I began to refactor it, and it makes heavy use of `let`, not just for the main model but for many other methods as well. Take that in: 4500 lines and 531 `let` definitions across 131 different method names.

00:03:36 Of those, 78 are overridden in different `describe` or `context` blocks. Nine of those are redefined at least 10 times, with a maximum number of redefinitions of 69. Additionally, there are 87 instances where `subject` is defined, and in almost every case, the type of the subject is different from the original definition, meaning some method or attribute of the system under test or a collaborator is being referenced as `subject`, rather than the system under test itself. This results in the `subject` returning a different type of object across different specs.

00:04:11 Overall, almost 14% of this file is dedicated solely to `let` and `subject` lines, not including multi-line blocks where the values are actually being defined. Much of this structure is front-loaded in the spec to support an extremely complex General fixture. General fixture is one of the root causes for obscure tests, meaning it's simply difficult to understand at a glance.

00:04:56 The term General fixture and obscure tests are terms used by Gerard Meszaros in his book About xUnit test patterns, describing when a test builds a larger fixture than necessary to verify the functionality in question. In Ruby and RSpec terms, at the beginning, you can see that white block with 31 `let` definitions in place to parameterize the default subject instance. This setup is not necessary for most tests, but since one or two `describe` blocks needed to reference some behavior, all specs in this file now need to do that work unless they define their own simpler subject.

00:05:21 This situation is harder to visualize, but the majority of these methods are also defined such that they're only used in a single spec. They would be much simpler and clearer if they were inlined into the `it` block. Needless to say, it's quite difficult to reason about a `let` or `subject` method; where is what it contains, and where do you even find it? It's not uncommon in this file for a `let` or `subject` to find hundreds of lines and several block scopes outside of where it was being referenced.

00:06:06 When I was refactoring, I had to jump 350 lines up to find the correct `let`. I wanted to share all this because as we move into a more contrived and manageable example, we should know that in the real world, there are test suites that look like this. In fact, in my experience, this is where most specs that heavily employ the RSpec framework tend to trend over time. Our goal should be to have manageable, understandable, and maintainable tests that we enjoy adding to and modifying.

00:06:36 This brings us to a situation where we're defining methods using `let` and `subject`, both being just extra steps, yet they are hard to track down because they are done through macros that are intentionally easy to redefine. Beyond the navigation issue, these methods construct themselves by referencing other `let` variables across different specs, which means these delegations reference different methods. What I talked about with the different contexts earlier can cause the code to seem to do the same thing across different specs even if they are running completely different code paths.

00:07:03 It is common for the actual type returned to change between different `describe` or `context` blocks, making it challenging to determine what you're dealing with when calling this method. So, what are our options for refactoring this situation? Perhaps the simplest option is to replace a `let` with an instance variable. For example, `let(:book)` could simply become `@book`. While this technically checks the box for avoiding `let`, and may be slightly more performant than a memoized lookup method, it maintains many of the same issues.

00:07:51 Specifically, instance variables are likely to be overridden or defined differently in different contexts, so there’s little chance of finding the correct definition for a given spec of even moderate complexity. Another option that solves part of the problem is to change these `let` and `subject` definitions to properly defined Ruby methods. This involves memoization using an instance variable or effectively calculating the value.

00:08:32 Doing this improves the odds that automated introspection tools, such as C tags or language servers, will find the correct definition. However, this approach does not address the fact that these methods are still being redefined in different contexts. Depending on their nesting, you're still dealing with different methods. The third and the one I recommend here is the inline method refactor.

00:09:22 This method was originally named by Martin Fowler in his book Refactoring: Improving the Design of Existing Code. It's pretty straightforward: instead of pulling these variables into `let` or `subject`, define them directly in the test. But Caleb, I hear you say, this is repeating yourself, and we don't repeat ourselves – that wouldn't be DRY.

00:09:56 This is a common simplification of the don't repeat yourself principle. Andy Hunt and Dave Thomas in the Pragmatic Programmer stated the DRY principle as: "Every piece of knowledge must have a single, unambiguous, authoritative representation within the system." One of those words matters here: knowledge. The goal of DRY is to avoid duplication of knowledge, not merely the duplication of code.

00:10:27 It turns out that duplication of knowledge also often tends to reduce code repetition, but that's a side effect, not the primary goal of DRY. This principle is often simplified for new programmers, myself included, as we learn about it as deduplicating code. While this aids in deduplication of knowledge, it can lead to premature extraction of code that then needs to be parameterized to support different use cases.

00:10:50 The default version of `subject`, which is defined as `described_class.new` or, more concretely in our example, `Book.new`, is actually more of an indirection than a consolidation of knowledge. `Book.new` is already a method; writing it into a `subject` doesn't free anything. However, even if it did, in the context of tests, I would argue that avoiding duplicate knowledge deduplication is not necessarily a good goal.

00:11:36 I’ve called these test specs because, colloquially, that's how I usually hear them referenced, but RSpec actually defines them as examples, and I appreciate that because it emphasizes the code you write should look like how it is used. David Bryant Copeland recently wrote that RSpec examples are, indeed, examples. The article's thesis is that avoiding predicate matters for expectation should be avoided in favor of more explicit expectations.

00:12:07 For instance, instead of using `expect(order).not_to be_sent`, it should be `expect(order).to eq(false)`, which is a good point. I must say it's a good point, particularly because, unlike Gerard Meszaros or to a lesser extent Dave Thomas, David might actually be in this room or at least attending the conference. Taking David's point even further, each example should be a self-contained example of how to reproduce behavior. In our RSpec example, given any individual example in isolation, we should have enough knowledge to reproduce the behavior specified in the code base.

00:12:52 Drying up tests can hurt understanding. Imagine that I had a nice visual here to show how code from a spec file is reorganized, but unfortunately, I didn't have time. Drying tests can reduce the utility of each example by relying on shared utility methods, worse yet, on unshared utility methods that convolute what's actually happening in the tests. If you look at an example and find it hard to understand what's being tested, how the system under test is intended to be used, and whether it actually works, then that example is not reaching its full potential as living documentation.

00:13:36 So, my advice for test setup is to write everything twice, or ideally more than twice. Embrace the idea that there may be a lot of duplicated setup between examples. Each of these examples serves documentation purposes. Doing the setup inline allows you to write only what you need in the test instead of building out fixtures that support multiple scenarios. It also makes it easier to use your code because it surfaces the collaborators: the arguments, the class, and other setup required to run the code.

00:14:11 The complexity introduced in the name of decreased repetition of code does not improve your specs, and it isn't a worthwhile trade-off to just obscure test setup. If something is difficult to test, that's an indicator that you're either testing the wrong thing or that your object has too many collaborators, leading to complicated setup. The first case means you can likely mock out the collaborator and simply pass the return type you need.

00:14:44 If you’re confident that the other class works correctly because its tests pass, you can have a single integration test proving the two components work together. Alternatively, your object might have too many collaborators and might be refactored to reduce its fan-in or the number of collaborators it relies on.

00:15:04 Since Mastodon is the cool new place to be, I thought it would be helpful to examine a portion of its tests for our refactoring example. Here's the full spec for the `mute!` method, presented in three columns for better visibility on a slide. This test examines the mute functionality of an account, and if you'd like to follow along later, it can be found in `spec/models/concerns/account_interactions_spec.rb`.

00:15:32 I realize you can't read any of this text—that’s okay! I'll walk you through it. While using bolding and color along with some animations to help draw attention to what's happening at a macro level, the details might still remain unclear. As you’ll see, this spec overrides the default `subject`, which is used nine times across roughly 100 lines of code. `subject` serves as a good starting point for our refactoring since when it's used, it's essentially present in every example.

00:16:52 As mentioned, we can use the inline method to take the body of the `subject` definition and place it directly into the body of each spec. Regardless of the fact that `subject` defined in these tests references `account_target` and `arg_notifications`, all of which are `let` methods, the code will continue to run here and access those methods.

00:17:40 One aspect to look out for is that these tests have a very similar and non-flat structure, which can indicate they may be testing multiple things. If we examine them closely, we see that each test contains a check, expecting them to mute. Each test is using an `expect_change` block to send a mute message and subsequently checking the return type.

00:18:06 This can be viewed as an optional refactor that isn’t directly related to `let` or `subject`, but we should take this chance to eliminate the redundancy of the type check. Upon zooming out, we can run these tests to ensure they pass. Don't worry—when I did, they indeed passed, so let’s move forward.

00:18:52 Next, we can inline the `let` methods such as `hide_notifications` and `arg_notifications`, which are similarly named as they are passed as keyword arguments of comparable designations. Their existence adds unnecessary confusion in this context, so let's inline them immediately.

00:19:29 By doing so, we eliminate the need for these context blocks since they no longer provide a space for defining `let` methods. Consequently, we can flatten these structures and merge the meaningful context and their text into the `it` descriptions.

00:20:12 Now, we can inline the mute functionality as well, especially since it’s used in the `before` block. Therefore, we should do that now.

00:20:36 At this stage, we are bringing code directly into specs that were previously obfuscated. The `hide_notifications` flag that was once set in each of these context blocks is now properly set in the specs, which makes it easier to understand its state without needing to refer elsewhere.

00:20:58 I will fast forward a bit to inline the last couple of `let` methods: `account` and `target_account`. These will seem to emerge from nowhere since, like the original example I demonstrated, these `let` methods are defined hundreds of lines away from where they're being utilized.

00:21:26 This specific file is only 650 lines long, as opposed to the previous 4500, but it uses these `let` methods extensively and fortunately does not override them. Thus, it becomes easy to determine which ones to inline. This situation raises the question of which advantage `let` methods provide by not being inlined initially.

00:21:52 We can define the inline methods in place of methods in some cases, while in others, we use local variables to replace method definitions, allowing us to use the same name to make assertions. Let's take a look at one of those refactored specs. This existing example is extremely terse and provides scant information about what we're testing.

00:22:29 It’s somewhat obvious we’re making assumptions about a subject; recall this test is for a concern called `account_interactions`. Therefore, we lack a class that provides adequate context for what the default subject should be, as defined by `described_class.new`. The innermost expectation checks that we are returning a mute, but we have no clue where this comes from or why it would be a mute, requiring us to scroll up about 55 lines to find the subject definition.

00:23:15 This reach is excessive compared to what typically fills your screen. Additionally, we check that the mute does not alter its height notification setting, but we have yet another lack of clarity about where the mute is defined and its context or how it relates to the subject — all of this information is entirely opaque to the reader.

00:24:00 It’s worth noting that the expectation involving being muted is duplicated nine times in this file and is unrelated to the main focus of this example, which is to verify the `hide_notifications` flag. Finally, we have no indication that there’s any code executing as part of a `before` block before running the test.

00:25:12 After our refactor, the test becomes understandable in isolation. Each piece of context, every piece of work, and every hidden step that was previously scattered throughout this large file is now consolidated here, making it easier to read through and analyze what is happening.

00:25:51 For many reasons similar to `let`, I recommend avoiding `before`, `after`, or `around` in most cases. However, at a global level, it makes a lot of sense for running things like database cleaner to run all tests within transactions, or using webmock to disable outgoing network connections.

00:26:51 Anything that’s going to apply to every single spec, or potentially for an entire category of tests, like any sub-directory of specs, is likely acceptable when using `before`. The warning light comes on when you start scoping these methods to a specific `describe` or context block, or even a spec file.

00:27:18 At the risk of repeating myself, it often makes sense to repeat yourself in order to have understandable tests. Avoid hiding excessive setup within a block, making things appear simpler than they are. What's more problematic about `before` compared to `let` is that it aspires to be additive rather than a replacement mechanism for previously defined methods.

00:28:17 Redefining `let` or `subject` means the code defined by that method outside the current scope will not execute, while `before`, `after`, and `around` methods combine, and this can cause a lot of setup code to run outside the test scope and make it difficult to track everything that's happening.

00:29:02 Shared examples, shared contexts, `behaves_like`, and similar RSpec tools suffer from the same issues as `let` and `before` since they almost certainly make use of those tools. However, they face additional challenges due to the need for indirection to maintain reusability, resulting in more complexity.

00:29:56 These constructs hide complexity while striving for reduced repetition and often require considerable setup to be reused. I recommend moving the essential specs into the main spec files, inline, instead of attempting to extract them using these tools.

00:30:36 When working with modules or concerns, instead of ensuring that every single class utilizing a module acts as if it does, consider testing it directly or generating a test example implementation to exercise the module effectively and establish a reference for your dependencies.

00:31:00 In summary, use RSpec, but focus on the good parts. Avoid `let`, `before`, and `shared anything`. Once again, I’ve been Caleb Hearth, and thank you so much for attending this talk. If you'd like to ask questions or discuss RSpec, Test Double, Mastodon, Dungeons and Dragons, please find me in the hallway.

00:32:00 I also want to express my gratitude to my employer for sending me to RubyConf this year. If you have ever felt frustrated getting stuck on a problem, we want to help with that. Please let us know what topics you feel need more resources, whether they be talks, blog posts, or screencasts by taking this survey.

00:32:44 Test Double loves pair programming because it can help you get unstuck. If you’d like to pair with a Test Double agent, please request a free virtual pairing session with a Test Double developer through this link. Thank you very much.