Maintaining Balance while Reducing Duplication: Part II

Talks

David Chelimsky

@dchelimsky

#refactoring

#behavior-driven-development-bdd

#activerecord

#rspec

#code-quality

#test-driven-development

Maintaining Balance while Reducing Duplication: Part II

by David Chelimsky

In this sequel talk titled 'Maintaining Balance while Reducing Duplication: Part II,' David Chelimsky presents critical insights into the DRY (Don't Repeat Yourself) principle and explores the implications of code duplication and refactoring techniques. The main focus is to clarify common misconceptions surrounding the DRY principle and highlight its broader significance beyond merely avoiding duplicated code.

Key points discussed include:

Understanding the DRY Principle: Chelimsky emphasizes that the DRY principle advocates for a single, unambiguous, and authoritative representation of knowledge throughout a system, encompassing not only code but also database schemas, test plans, and documentation.
Active Record and Its Challenges: He acknowledges the benefits of Active Record in the Rails framework but illustrates how the necessity of defining attributes in multiple places can lead to confusion and duplication of knowledge.
DataMapper Advantage: Although he does not have extensive experience with DataMapper, Chelimsky points out that it enables defining attributes and validations in one place, thus promoting a clearer design of knowledge representation.
RSpec and Executable Documentation: Chelimsky discusses RSpec as an effort to maintain the DRY principle in testing. He introduces the idea of self-validating documentation but raises concerns regarding RSpec’s structure and its ability to consistently embody the DRY principle.
Duplication as a Code Smell: He cites Uncle Bob Martin's assertion that duplicated code is a significant issue and provides examples of how refactoring to reduce duplication can lead to increased coupling and complexity if not approached carefully.
Refactoring Techniques: Strategies such as method extraction and leveraging superclass templates are discussed, illustrating how these methods can enhance clarity and reduce redundancy without complicating the code base.
Clarity vs. Aesthetics of DRY: Chelimsky warns against the potential downsides of reducing duplication, suggesting that in some cases, duplication may indicate meaningful distinctions that are essential to knowledge representation.
Readability and Complexity: He stresses the need to balance abstraction with readability, advocating for clear communication in code and documentation to facilitate understanding and maintenance of projects.

In conclusion, Chelimsky advocates for maintaining clarity, reducing unnecessary complexity, and ensuring that code remains informative and understandable as the codebase evolves. He highlights the importance of thoughtful refactoring and maintaining an authoritative representation of knowledge within systems, allowing for effective communication and documentation in software development.

00:00:16.560 Quick show of hands: how many of you did not attend or see online part one of this talk? A fair number, okay. That's good. Sorry for the rest of you because there will be a little bit of repetition, but hopefully, there will be some new material for you as well.

00:00:29.119 What I'm going to talk about are a couple of different things: I'm going to discuss duplication in code, and I'm also going to cover the DRY principle. How many of you have heard of the DRY principle? Oh, pretty much all of you, right? And how many of you think that the DRY principle says 'don't repeat yourself'? Fair number, right? Okay, here's the thing: that's not what it says.

00:00:46.000 What it actually says is this: Every piece of knowledge that is important must have a single, unambiguous, authoritative representation within a system. This is not just about duplicated code, which is a code smell—this is a different concept from a principle. But we're going to focus on the DRY principle for a minute, so let's discuss it.

00:01:20.960 In an interview, Dave Thomas, who co-authored 'The Pragmatic Programmer' with Andy Hunt, discussed this idea of the DRY principle. Dave mentioned that most people take pride in believing it means you shouldn't duplicate code, but that's not its intention. The idea behind DRY is much broader than that; it encompasses knowledge across a system, not just in code. It includes database schemas, test plans, the build system, and even documentation. This concept inspired the development of Active Record, particularly the parts relating to databases.

00:02:08.479 How many of you were using Rails when it was in the early versions, like 0.1 or 0.2? A few of you, right? You may remember having to write your own SQL statements to put the schema together before migrations were introduced. We had to define attributes in SQL, and the idea was to avoid the need to declare them in multiple places, as was required with tools like Hibernate in Java and C#. With Active Record, we could define attributes in SQL in one place while still being able to reference them in the model.

00:02:50.160 The next step was introducing methods into the model. For example, we have a 'full name' method that uses first and last names. Remember that DRY states that every piece of knowledge must have a single representation. If I want to ask what kind of names a person has, I need to look in two different places, which goes against the DRY principle.

00:03:04.400 In IRB, if I ask for public instance methods matching 'name,' I’ll only get 'full name' and not 'first' or 'last name' because those are not instance methods yet. You'd have to delve into the internals of Active Record and Ruby to uncover that information. This leads back to the issue of needing authoritative knowledge in our code.

00:04:24.799 Let's take a moment to understand the term 'unambiguous.' When I looked up the definition, it described something as 'not ambiguous,' but in practice, it isn't always clear what 'unambiguous' truly means. When it comes to validations, you're not allowed to have nulls in the first or last name fields. The interesting part is that we can actually define validations in either place; the challenge lies in determining which definition is the authoritative one.

00:05:20.800 For instance, in code we might have a character limit of 255 for a field, but a validation might later restrict it to a maximum of 50. Which one is the authoritative rule? This lack of clarity is what we need to address.

00:06:04.840 Later on, there was significant excitement about the introduction of migrations, allowing us to write everything more fluidly. However, this didn't solve the core problem, as we were still defining attributes in two different places, albeit in a familiar format.

00:06:52.960 Now, a disclaimer: I've never used DataMapper on a real project; I've only played with it. Thus, I'm not advocating that it’s superior to Active Record. That said, one advantage of DataMapper is that you define attributes, validations, associations, and methods all in one place—in the model. From these declarations, you get the schema.

00:07:38.880 So, I'm curious: do any of you have experiences with DataMapper? Any thoughts on its pitfalls? I imagine things can get complicated when migrations become intricate. Nevertheless, DataMapper presents a situation where knowledge is single, authoritative, and unambiguous, contrasting with Active Record.

00:08:18.640 It's essential not to panic when we discuss Active Record because it offers numerous benefits, especially regarding integration points in Rails. While we criticize Active Record here, I think it's essential to recognize its strengths. Moving on to another point, I have a love-hate relationship with RSpec. I ask, how many of you dislike RSpec? I can relate; how many love it instead? It's fascinating to see such mixed feelings about it.

00:09:43.120 One early inspiration behind RSpec was the idea of DRY, generating definitions from our test plans. RSpec is essentially executable documentation, providing a way to define documentation through examples. Here's an RSpec example using new syntax—though I won't get into that right now.

00:10:31.919 When you run this with a specific formatter, a summary is generated from the code, which exemplifies the DRY principle. RSpec specs are essentially self-validating documentation, or as I like to say, we are ‘eating our own dog food.’ However, an issue arose with the Runner, a reworking of RSpec, that eventually evolved into Cucumber—an entirely separate project.

00:11:01.119 So we now utilize Cucumber alongside RSpec. While I want to say that RSpec itself isn't DRY in its fundamental structure, the way we use it for executable documentation sometimes falls short of the DRY principle. Just to clarify: you can still love RSpec, and if you hated it, I hope my talk hasn’t changed your perspective too negatively.

00:14:27.920 Now, let's talk about duplicated code, which represents a code smell indicating similar constructs within the codebase. This isn't just a problem; famous individuals like Uncle Bob Martin have referred to duplicating code as the root of all evil in software. While reducing duplication is crucial, it often inadvertently increases coupling by creating new dependencies.

00:15:54.240 You need to be cautious about how you couple things in your code. For example, Uncle Bob shared an instance from his experience where two methods were identical but performed conceptually different tasks. The dilemma of whether to eliminate duplication often leads to unexpected consequences in other parts of the system.

00:17:08.400 Instead of simply delegating one method to another, it's often better to extract the logic into its own method, naming it to reveal the intention behind the operation. Extracting methods isn't just for reuse; it fundamentally changes the design of your code.

00:17:19.519 Refactoring, in essence, restructures the design without changing the behavior. However, it's essential to ensure that the new method provides clarity to future developers. Each public method should have meaningful names that accurately describe their functionality.

00:19:28.960 Another refactoring technique to consider is extracting superclasses to reduce duplication. Suppose we have several matchers in RSpec showing clear duplication. We can notably improve this by implementing a base matcher class, pulling up common behavior. But as we make these changes, we must be wary of introducing additional complexity by relying on a super call that may change over time.

00:20:17.440 Instead, consider introducing a template method in the superclass that defines the overall procedure, allowing subclasses to specify unique implementation details. This approach not only reduces duplication but also aids in readability, as it lets changes be handled at a higher abstraction level.

00:21:36.480 Earlier in this talk, I touched on the importance of maintaining clarity while reducing duplication. To illustrate, I once worked on a project where routes contained a considerable amount of duplicated strings. We identified common prefixes and constants that could be factored out, simplifying the structure. Yet, we must also understand that sometimes, duplication represents vital distinctions in knowledge.

00:23:18.720 DRY does not merely refer to avoiding typing the same characters, but about recognizing the underlying concepts. In some cases, reducing duplication can obscure critical knowledge by signalling that all similar constructs mean the same thing when they might not. Thus, we should prioritize clarity and meaningful representation over the mere aesthetics of DRY.

00:24:58.880 Moving to another topic, I want to address concerns around what we deem 'objectifying' within Rails, aiming to evaluate our abstractions to make our work easier. However, we often run into readability problems when too many structures are combined, leading to confusion during debugging and maintenance.

00:27:01.040 For instance, a certain pattern of nested calls can instantly become an unreadable mess in the context of a single file. Instead of improving clarity through abstraction, we should weigh the benefits against the potential loss of readability in our tests and code.

00:28:26.640 Addressing a question about the utility of mock objects, confusion often arises from how we term these elements. The practicality of mocks in increasingly dynamic systems has shifted; you might have mocks interacting at method levels instead of simply being used as placeholders. The crux is making sure tests are straightforward enough to promote understanding while preventing unnecessary complexity.

00:30:29.919 In conclusion, effective communication is vital; maintain readable code and documentation in tests to avoid confusion as your codebase evolves. Each change or addition should communicate clear meaning, and frameworks should be applied where they suit your project without detracting from clarity. Thank you for this opportunity; I'd love to entertain questions!

00:33:11.760 To finalize, what do I think is the most beautiful codebase I've seen? Without overthinking, I'd have to say it's the Fit codebase written by Ward Cunningham because it succinctly expresses complex concepts clearly, maintaining a balance between simplicity and depth. If you're curious or have questions about this piece or any other topic we've tackled today, feel free to ask.

GoRuCo 2012