BDD with Shoulda

00:00:14.200 Well, my name is Tammer Saleh. I wrote a testing framework called Shoulda and I worked for Thoughtbot.

00:00:21.000 That's the introduction. Alright, next slide.

00:00:26.160 So, when I was in high school, my English teacher always taught me to write hamburger paragraphs. They are kind of a cheat, but it's good to give you guys expectations about what I'm going to be talking about.

00:00:32.239 We're going to talk about BDD, which is kind of the new buzzword in testing. We will discuss Shoulda and testing in general.

00:00:43.920 First off, what is BDD? When BDD first came out, I and many of my co-workers were confused by the whole thing because TDD was already established. When we heard people talking about BDD, our first reaction was, 'Well, isn't that what we are doing? Isn't that kind of what everybody's doing?'

00:01:00.239 However, it's a new way of thinking about test-driven development. BDD forces you to look at it as describing behavior instead of just describing tests. In practice, you write these short specifications that describe one piece of behavior at a time.

00:01:18.080 This is a quote from behavior-driven.org, and the important part is that behavior-driven development is a rephrasing of existing good practices. What it's not is a radically new departure, and for that alone, I think BDD is useful terminology.

00:01:36.600 So, what is Shoulda? I think it was Joe O'Brien in a presentation earlier who talked about wanting to be able to test a single line of code with just a single line of code. If I only have to write one line of code to get some behavior from, for example, Rails, then I should only have to write one line of code to test that behavior. That's where Shoulda was born — from that exact same test helper that he wrote, which came out of Thoughtbot around the same time.

00:02:11.840 What it evolved into was nested contexts and readable test names. It had to be fully compatible with test unit because we are a Rails consultancy, and we already had a lot of applications out in the wild. We weren't able to retool to use a completely different testing framework.

00:02:29.080 As I was saying, you want to be able to test things as simply as you write them. Since most of the work we do is Rails work, Shoulda comes with some ActiveRecord and ActionController macros we are looking for about 80% coverage here.

00:02:56.720 As REST became a bigger part of Rails, Shoulda also embraced it and tries to make it as easy as possible to test your RESTful controllers. Now let's look at the basic building block of Shoulda, which is just a should statement.

00:03:10.200 The important parts of this code show that this is a normal test unit test case. It has normal setup, and it includes a regular test unit test. Below it, we have a should statement, which is just a normal test that uses the defined method to create a test. It's magic that we are all used to. Shoulda also comes with contexts, which allow you to envelop setup information so you can wrap sets of tests in common behaviors.

00:03:43.240 Contexts can also be nested, which was a big requirement at the time. By nesting contexts, you can really increase the readability of the test and shorten the amount you write. That's it for the Shoulda gem; it comes in two parts: one is a gem, and the other is a plugin for Rails.

00:04:03.680 If you want to use Shoulda in regular Ruby projects, you have all that functionality as a very small piece of code. The plugin actually includes the gem inside, so you don't have to use both if you're utilizing a Rails project.

00:04:19.560 Now I want to take a second to talk about macros and DRY code because recently there's been a bit of a backlash against DRY, and it's understandable. A lot of people have been doing cargo culting and cookie cutter cutting and pasting of code to try to make everything as dry as possible. Many have overused Ruby magic, creating unmaintainable and unreadable code, which leads to a backlash. However, I want to remind everyone of why DRY exists. It's not just to reduce the amount of typing a programmer has to go through.

00:05:06.320 DRY code, when written well, is faster and easier to read. That, to me, is the most important component of maintaining a codebase. I want to be able to look at the code and immediately understand what's going on. Also, DRY code reduces programmer errors; it’s the same reason that methods and classes exist — to reduce errors.

00:05:34.720 DRY code also distills programmers' best practices. For example, if you have a programmer on your team who really understands one aspect of the code that is used in multiple places, you want to encapsulate that, so it can be tested the same way, repeatedly. You don’t want every programmer to write their version of tests that essentially test the same behavior. So Shoulda comes with some ActiveRecord macros to simplify testing for those one-liners and the straightforward tasks that ActiveRecord does. If writing ActiveRecord is easy, it should also be easy to test, and it covers 80% of the common ActiveRecord macros.

00:06:13.920 This is an example of a contrived test case we've got. Each one of these statements is just a Shoulda macro, which implements best practices for testing. This specification is incredibly easy to read. It's concise and provides all the necessary information about the user, including required fields such as name and phone number. It specifies that a unique name is required and describes accepted and rejected phone numbers.

00:06:39.320 It protects the admin flag while handling various user properties. Shoulda generates tests for each of those requirements. Now, regarding controllers, we maintain the same philosophy of making testing for the most common functionalities straightforward so that you won't be facing the daunting prospect of hundreds of lines of test code for a single action.

00:07:19.680 This example utilizes a context statement applying a 'get' method to show, followed by should statements that confirm the assignment of instance variables to the user. It checks for successful rendering of the show template and maintains any desired behavior.

00:08:01.440 Here is an overview of most of the functionalities we currently have for controllers, and we are continuously working to expand this list. This is part of how Shoulda is evolving to make your tests easier to write and understand. RESTful controllers are particularly intriguing because many of you likely work primarily with RESTful controllers in your Rails applications.

00:08:59.679 It’s concerning when you realize that nearly all RESTful controllers are almost identical. This is a strong indicator that it could be a code smell, and due to that similarity, many attempts have been made to make the RESTful controller code DRY. You've got AutoRest, MakeResource, and ResourceController, to name a few. Shoulda adopts this philosophy, recognizing that if all these controllers are nearly identical, the tests for them will inevitably be similar as well.

00:09:50.760 The controller behavior is usually almost the same. However, various testing scenarios must be captured, such as logged-in actions versus those that are not logged in, and ensuring correct behavior under various resource nesting conditions. Shoulda strives to make these scenarios compact and efficient.

00:10:37.840 You should be able to reasonably assume the basic actions of a RESTful controller, such as index, show, new, edit, and so on. This should apply to both HTML and XML formats. A common issue arises when testing RESTful controllers, where developers prioritize HTML without focusing on XML.

00:11:00.560 However, there is significant activity occurring with the XML, which also warrants thorough testing. The Shoulda codebase has been crafted to be extendable, allowing you to add JSON or any other type of RESTful testing.

00:11:30.680 We've added various enhancements to Shoulda to facilitate RESTful actions. A few lines of code can generate a substantial number of tests—up to 200—depending on the configuration you provide. Those five lines of code can test an entire RESTful controller.

00:11:54.960 To specify what to test, you only need to provide the create and update parameters. Unfortunately, there’s no way for Shoulda to determine these automatically. Everything else is derived from the name of your test class. I will provide some examples of tests Shoulda produces based on configuration, easily yielding a waterfall of tests.

00:12:54.160 Now, I must address whether or not Shoulda RESTful testing is a good idea. It's important to create tests that are easy to write, and generating short, straightforward tests is preferable to having one comprehensive test with multiple assertions for a single action. However, understanding what is being tested is vital.

00:13:35.559 For that reason, I've made sure that the source code inside Shoulda RESTful is as easy to comprehend as possible. I encourage anyone utilizing Shoulda to delve into the code and understand its functionality. You will most likely need to implement your own tests around Shoulda RESTful, particularly for any actions that deviate from typical CRUD operations.

00:14:18.120 Now I'd like to discuss Shoulda internals. First, I will describe how it used to work, then I'll explain the refactor that took place for various reasons. Initially, the implementation of context was quite naive. We just needed to get things running quickly as a proof of concept.

00:15:06.680 Should contexts established the setup and teardown methods directly within test unit without utilizing classes, which can lead to namespace pollution. When class variables were leveraged to maintain the contexts, it created a situation where any invocation of context in a block only set class variables without proper management.

00:15:56.960 Recently, Rails introduced new mechanisms, breaking Shoulda's functionality, as Shoulda had defined setup on test unit when it arguably shouldn’t have. Consequently, we undertook a rewrite, resulting in a cleaner design.

00:16:26.839 Our new context class includes setup and teardown methods defined within it, while the two main methods on test unit—Should and Context—build context classes or instances and delegate functionalities. This eliminates namespace pollution, ensures compatibility with Rails Edge, and creates a much cleaner system overall.

00:17:14.200 Should statements create one-off contexts containing single statements, and within the context block, it records the name and the block. So when you use should in its context, it records the name and the block, builds the required testing at the end of the context, and executes them, encompassing setup and teardown sequences.

00:17:57.280 Here's an example of one of the ActiveRecord macros that Shoulda provides. This specific macro focuses on protecting attributes. While parsing the options introduces some complexity, the rest remains straightforward, looping through the necessary attributes for protection and generating Shoulda statements for verification.

00:18:50.960 An essential aspect of this test method is that, as previously discussed, we aim to avoid testing the framework whenever possible. We seek to confirm that Rails is notified about the need to protect attributes, without taking direct action to confirm Rails itself is implementing that protection as expected. We trust that Rails will conduct its defined processes, while Shoulda ensures you are instructing it accurately.

00:19:47.239 The great thing about Shoulda is that it encourages the creation of your own macros. They are simply methods that encompass should statements or contexts. Here’s a common example: You would define a method called logged_in_as, which takes a user as an argument. The user can be a symbol or an instance, depending on your needs, and it simply logs the user in.

00:20:34.760 This might seem like an incredibly simple macro, but it makes examining your functional tests much easier. When you see logged_in_as_admin followed by several should statements or shoulda statements, you're assured that everything in that block pertains to a user logged in as an administrator.

00:21:15.440 This is where writing straightforward macros and keeping your tests DRY can significantly enhance their readability and maintainability. Next, let's discuss some general testing strategies, especially concerning mocking and fixtures, black-box versus white-box testing, and methods to avoid brittleness in your tests.

00:21:52.280 Mocking has received extensive discussion over the past six months to a year, as it gained traction in the Ruby and Rails communities. Mocking has numerous benefits, one of which is that it keeps your tests focused on the code at hand, allowing for integrated testing with external resources. Long ago, before I understood mocking or trusted it, I had to integrate with a credit card service and ended up building a small camping application to simulate that service for testing.

00:22:50.880 That approach worked but proved to be a significant inconvenience for my developer friends, as they needed to launch this camping server to run unit tests. Thus, we quickly refactored to use mocking instead, which was a no-brainer once we grasped its usefulness. Mocking greatly improves the readability of tests, especially when dealing with complex object graphs.

00:23:39.400 Without employing mocking, you can end up instantiating many extraneous objects, requiring significant effort to handle validations for testing a small portion of functionality unrelated to them. However, mockings downfalls include instances of over-mocking leading to brittle tests. This issue will be explored further in the context of black-box versus white-box testing.

00:24:28.000 In my opinion, the Holy Grail of testing is a test suite that remains intact following refactoring that does not alter application behavior. If I'm working on a functional test that uses mocking to assert functionality, changing underlying implementations should not affect its success. A functional test should verify that the system identifies the correct record, and if it can't, an appropriate error message should arise.

00:25:04.360 However, if you utilize mocks in your functional tests, you will usually have to mock the specific calls, which leads to dependencies on internal code structures. Consequently, even a simple refactor can lead to the tests failing, resulting in brittle tests that lack reliability.

00:25:46.560 On the other hand, fixtures are often misunderstood. They are ultimately brittle and problematic. I can’t express enough how many applications I've encountered that rely heavily on fixtures, which often results in unexpected failures when a single fixture changes, causing failures in numerous tests. In many cases, fixing one test can trigger a cascade of other tests failing, forcing developers to sift through countless tests to decipher how they depend on particular fixtures.

00:26:23.960 Another issue arises because fixtures bypass validations. While there have been new updates in recent Rails releases, and some plugins that have addressed these downsides, I still find fixtures to be unreliable.

00:27:09.080 Nonetheless, the use of fixture scenarios has effectively resolved many of these pain points. With the foxy fixtures enhancements, we can create specific scenarios without impacting our existing fixture setups.

00:27:32.360 Generally speaking, fixtures can lead to poor code structure and introduce unnecessary complexity. Inline object creation and contextual nesting can greatly enhance maintainability. Regardless of the testing framework utilized, leveraging techniques to improve overall test quality is crucial; I’ve struggled with extensive line test files filled with direct test unit calls.

00:28:56.240 Implementing nested contexts not only ensures your tests are correct but also avoids dealing with complex dependencies across the application. You want your writing to be concise and focused, encapsulating only the behaviors and attributes being tested.

00:29:37.800 You must be aware of extensive mocking's dangers, particularly for unit tests, while ensuring that integration tests benefit from using actual models and realistic scenarios. An interesting pattern emerged among my colleagues at a consulting firm involving the object data pattern for generating Active Record models efficiently within tests.

00:30:29.120 Despite being unrelated to Shoulda, utilizing factory patterns greatly simplifies using valid Active Record objects in your tests. You can streamline object creation with tag factories, providing a one-line solution to generating a valid model without redundant validation checks.

00:31:29.840 Let's get into white-box testing. White-box testing inspects the internals of your code, primarily through testing private methods or mocking internal operations. The significant advantage of white-box testing lies in its ability to keep tests concise and straightforward, making your intentions clear while allowing a higher coverage rate.

00:32:17.680 However, over-mocking can result in brittle tests, as any refactor must correlate with changes in your tests, regardless of whether the broader functionality remains intact. In contrast, black-box testing focuses solely on the public API; here, you execute methods and examine the outputs. This approach should guarantee that any changes to your API will be effectively tested and validated.

00:33:21.680 The upside of black-box testing is that it helps identify issues when integration points do not function as intended. This ensures that you review your expectations correctly rather than relying on mocked structures, which can easily present false positives, leaving potential bugs undetected.

00:34:21.680 Brittle tests can surface when minor changes result in unexpected failures. White-box testing can magnify this phenomenon, leading to false confidence in test suites. The fundamental premise of tests is to generate confidence in a codebase. If a test passes one day and fails the next, perhaps due to environmental discrepancies, then you risk undermining that confidence.

00:35:42.640 Another issue arises from the lazy assumptions made when writing tests. Assuming consistent test orders or relying on pre-built fixture data leads to brittle tests. For instance, under certain circumstances, tests run sequentially may pass due to external factors, but fail in alternative orders, indicating a structural fragility in the tests.

00:36:27.720 To avoid brittle tests, maintain awareness that assumptions can lead to inconsistencies. You want to ensure you describe the most critical behaviors rather than focusing on specifics like orders of array elements from a search.

00:37:36.000 When crafting tests, it’s vital to remain focused on core behaviors—while relative output order may vary during searches, the attributes of the output are what's significant.

00:38:37.680 Additionally, maintaining shorter, self-contained tests is optimal; if your tests nest too deeply, they become cumbersome and difficult to follow.

00:39:29.760 The approach should be to isolate individual behaviors effectively and avoid using fixtures that create dependencies outside the test file, ensuring tests remain self-contained and focused.

00:40:19.560 By being cautious of over-mocking, you can favor black-box testing as a means of avoiding brittleness. Despite valid arguments for white-box testing, it’s imperative to understand that maintaining reliability is of primary concern.

00:41:14.480 Writing effective tests entails mindfulness regarding what you’re testing, emphasizing one piece of behavior at a time, while good test naming conventions will assist in clarifying testing intentions.

00:42:08.640 Tests should articulate expected behaviors rather than implementation specifics and always strive to avoid brittleness. Testing often proves more difficult than application code, but their value lies in the potential to save you hours of debugging whenever untested code suddenly fails.

00:42:48.440 That said, I believe effective tests are a fundamental part of behavior-driven development. If you conduct good TDD practices, you will naturally participate in behavior-driven development.

00:43:26.640 Let's turn our focus back to Shoulda, where we’ll discuss future directions. We aspire to develop active record macros further and provide more support for JSON and YAML. Additionally, we aim to refine the Shoulda RESTful testing mechanism, as it currently possesses a rather bulky configuration requirement.

00:44:35.560 I invite other maintainers to help accelerate the development of Shoulda at a good pace. If you seek more information about Shoulda, it is equipped with homepage content, thorough documentation, and active community groups.

00:45:28.680 I also want to express great appreciation for Thoughtbot. They are an incredible company with several locations, particularly in Boston, where I have the opportunity to focus on projects like this during work hours. It's an amazing family-oriented community there, and by the way, we are hiring in both Boston and New York offices. If you're looking for work in that area, feel free to speak with me.

00:45:56.000 Does anyone have questions? I love answering questions.

00:46:22.280 Now, regarding the black-box testing topic, could you clarify its value for testing private methods? By definition, if you eliminate a private method, it doesn't affect your public API. Should we be concerned with testing something that can’t break?

00:47:16.800 Actually, when I refer to testing private methods, I meant writing tests specifically for them. It’s about validating that individual behaviors work correctly even if they are hidden within private methods. It’s good practice to extract complex logic into smaller, well-defined methods.

00:48:00.680 When testing private methods, I don't usually keep those tests after debugging but might find them useful if there’s complex logic involved. They assist in determining specific issues if a public method fails, since those private methods are directly related to the public method functionality.

00:49:01.040 In closing, if you've broken down responsibilities within cohesive classes, re-organizing them can often clarify dependencies, simplifying the refactoring and testing process. Public APIs should be the main focus.