RailsConf 2014

Writing Small Code

Writing Small Code

by Mark Menard

The video titled "Writing Small Code" features Mark Menard discussing the challenge of creating smaller, more manageable code in programming, particularly within the Ruby language. He emphasizes the importance of writing small classes and methods by using good design and iterative refinement.

Key points discussed include:

- The Problem with Large Classes: Many developers encounter classes filled with complex code that are daunting to modify, leading to duplicated logic and poor understanding.

- Defining Small Code: Menard clarifies that small code isn't merely about the number of lines but rather how code is structured and organized. He underscores that small, well-designed code enables better systems composed of understandable parts.

- Incremental Improvement: Writing small code is an iterative process that often requires refactoring to improve design over time. He advocates for maintaining 'green' tests throughout this process to ensure code reliability.

- Importance of Abstractions: Menard introduces the concept of abstractions as crucial for decomposing large methods and classes into smaller, re-usable components, allowing for easier changes in future requirements.

- Good Design Principles: Emphasizes principles such as separation of concerns, single responsibility, and reducing conditional logic to enhance readability and maintainability.

- Refactoring Tools: He presents practical techniques like extract method and extract class to simplify complex methods and improve organization.

- Illustrative Example: Menard walks through a simple example of creating a command line options parser, demonstrating the evolution from a monolithic class structure into simplified, cohesive classes while handling multiple option types (boolean, string, and integer).

- Dynamic Programming: He discusses leveraging Ruby's dynamic features to efficiently manage those classes without introducing unnecessary complexity.

In conclusion, Menard reinforces that the key to sustainable code is to produce small, comprehensible units that remain flexible for future modifications. By continually refining and refactoring with an eye towards abstraction and cohesive design, developers can create software that not only meets current requirements but also adapts gracefully to future needs. He encourages participants to engage in discussions about coding practices as part of the RailsConf experience.

00:00:16.410 Thank you to the organizers here at RailsConf. This is my first time speaking at RailsConf, and frankly, it's kind of intimidating to be up here and see so many people out there.
00:00:21.760 My name is Mark Menard, and today I'll be talking about small code. I've prepared a lot of content: about 79 slides and 137 transitions—it's quite a bit to get through.
00:00:28.240 Let's get started. I want to let this quote sink in: all of us have that file filled with code that you just don't want to open.
00:00:35.860 As you heard earlier, it might be your user class. That class has comments saying, 'Woe to ye who edit here.' The issue with this kind of code is that it tends to live forever. It encapsulates business logic that often gets duplicated elsewhere because no one wants to go in and look at that complicated code. It’s also very hard to understand.
00:00:52.180 I'm going to talk about ways to avoid this situation, focusing on code at both the class level and the method level. Writing small code at both levels is fundamental to creating systems composed of small, understandable parts.
00:01:15.610 Let's start with a few basic concepts to ensure we're all on the same page. Many people struggle with what they think of as smaller, well-designed code. It's not about the total line count.
00:01:36.600 Well-designed code typically has more lines than poorly designed code. The overhead of declaring methods and classes increases your line count.
00:01:49.360 It’s also not about the method count. Well-factored code will indeed have more smaller methods. And it isn't about the class count either. Well-designed code will almost definitely have more classes than what I refer to as 'undesigned code.'
00:02:09.729 However, I've seen some cases where over-abstraction occurs, but that’s quite rare unless someone goes pattern-crazy. Small code is not about reducing the number of classes in your system; it's about having well-designed classes that aren’t poorly structured.
00:02:23.270 What do I mean by small? It refers to small methods and small classes. Small methods are the foundation of writing small code. The ability to decompose large methods into smaller methods is crucial.
00:02:38.270 To write small code, we must be able to decompose large classes into smaller classes, extract responsibilities, and base them on higher-level abstractions. It's important to keep our classes small because small classes lead to reusability and composability.
00:03:01.040 So, why should we strive for small code? Why is it important? We cannot predict the future; our software requirements are going to change. Software must be flexible enough to adapt to those changes.
00:03:15.980 Any software system that hopes to have a long and successful life will change significantly. Small code is simply easier to work with than large, complex code.
00:03:36.180 If you believe that your software requirements will never change, then you can ignore everything I say here, but I doubt that’s the case.
00:03:50.570 We should write small code because it helps us raise the level of abstraction in our code. This is one of the most important aspects of creating readable and understandable code.
00:04:07.340 Good design drives toward expressing the ubiquitous language of our problem domain within our code. The combination of small methods and small classes helps us elevate that level of abstraction, allowing us to express higher-level domain concepts.
00:04:26.330 We should also write small code to effectively utilize composition. Small classes and small methods work together well. As we compose instances of small objects, our systems will become message-based.
00:04:44.910 To build systems that are message-based, we have to use delegation and small composable parts. Small code creates small composable parts, which allows our software to be flexible over time.
00:05:01.790 This flexibility helps us accommodate future requirements without needing a forklift replacement.
00:05:08.600 The goal is to create small units of understandable code that are amenable to change. Our primary tools for this are 'extract method' and 'extract class.' Longer methods tend to be harder to understand than shorter methods.
00:05:25.670 Most of the time, we can shorten a method simply by applying the extract method refactoring technique. I use this approach all the time when coding.
00:05:37.899 After establishing a coherent set of methods around a certain concept, we can look to extract them into a separate class and move the methods there.
00:06:04.259 Let's explore the example of a command line option parser that handles boolean options. We want to run a Ruby program with a '-V' flag and handle boolean options.
00:06:20.470 In the Ruby program, I’ll define the options I’m looking for using a simple DSL. Then I want to consume it like this, checking if the options include a particular option before taking action.
00:06:34.610 Here’s how that all comes together: the DSL at the top, followed by how we consume the options object. Pretty straightforward.
00:06:51.640 Here's my specification: it should return true if the option is defined and present on the command line, and false otherwise. I run my specs and encounter two failures. Yes, I am using TDD.
00:07:02.540 Here’s the implementation, which fits nicely on one slide. I store the defined options in an array and keep the arguments for later reference. There's a 'has' method to check if the option is defined in the array.
00:07:16.090 Then I have my 'option' method, which implements my simple DSL. It’s nice and readable, fitting on one slide and probably easy to comprehend.
00:07:30.550 After running my tests, I achieve zero failures. They pass, so I'm done.
00:07:35.870 Until the future comes along, that is. My colleague comes to me and says, 'Hey, I really like that library, but can we also handle string options?'
00:07:48.740 This sounds straightforward, so I think about it and come up with a small extension to the DSL to pass a second argument to signify the option type; in this case, string. I also default to allowing boolean by not changing the existing code unnecessarily.
00:08:04.030 A string option is different from a boolean; it requires content. Therefore, I need to incorporate validation for string options—the absence of content indicates that it's not valid.
00:08:21.290 Now, I also have to normalize how I retrieve values from both string and boolean options. This change alters the API, but sometimes a small change is necessary to accommodate future growth.
00:08:38.000 This is a good time to break the API, especially since I have only one person using the library for now.
00:08:50.360 Putting it all together again, I can now pass the options on the command line, define them with the DSL, and here's how I use my validation and value methods to check if it’s valid and to extract the values.
00:09:07.740 Now here's the class that implements it, again fitting on one slide. It’s probably not as readable as before and might be a little harder to comprehend.
00:09:22.300 We're headed down what I refer to as the 'undesigned path.' It's not too large at 31 lines, but it does have issues. I’ve got a method that’s definitely large and teetering on becoming unwieldy.
00:09:32.310 It has to handle both boolean and string options, which adds quite a bit of conditional complexity. Soon, we’ll find it won’t be very amenable to change.
00:09:49.030 Let’s examine the components and how they work. The initialize method creates a hash to store the options because we need to store the type—not just the knowledge that an option exists.
00:10:05.740 Now we have a valid method that checks which options are strings. We’re checking both the type and whether they have content. The string options need validation, while the boolean options do not.
00:10:20.660 In the value method, there's a lot going on. Let’s pretend this method is a black box for now; we’ll revisit it because it's by far the worst code in this example. However, all my tests are still passing.
00:10:38.750 Let's dive into methods. We’ve got some big ones that need to be cleaned up. I call it the first rule of methods: do one thing, do it well, and do only one thing.
00:10:55.740 This principle harks back to the UNIX philosophy of tools that you can string together. But how do we know if a method is doing only one thing?
00:11:14.000 Here, our level of abstraction and the abstractions in your code come into play. Over time, you need to develop a feel for maintaining one level of abstraction per method.
00:11:30.330 If all of your statements are at the same level and coherent around a purpose, I consider that to be doing one thing. It doesn’t necessarily mean that a method can’t span multiple lines.
00:11:46.920 Often, I view methods whose comments succinctly describe their functionality, only to find that the method name isn’t as descriptive as it should be. This highlights the importance of using descriptive method names.
00:12:02.740 Using fewer arguments is also critical. My personal goal is to have zero arguments on methods. One is okay; two or three usually indicate I might have missed an abstraction.
00:12:18.640 Make sure you query something before you change the state of your object. This approach can confuse those who consume your library.
00:12:36.560 And as always, don’t repeat yourself. As Sandy discussed earlier, it takes judgment to decide when to eliminate repetition.
00:12:50.500 Leaving repetition in your code will come back to haunt you. Let’s examine our methods. Both 'valid' and 'value' currently inspect the 'ARGV' array to find options from the command line.
00:13:07.350 This is a perfect example of a candidate for extraction through method refactoring. We also have magic constants scattered about, indicating missed abstractions.
00:13:23.960 Both methods aren’t purely doing one thing. 'Valid' digs into the 'ARGV' array while 'value' figures out different types and how to return their values.
00:13:40.050 Now we’re going to eliminate some of the repetition by conducting an extract method refactoring. This involves moving part of a method into a new method with a descriptive name, thereby maintaining consistency in abstraction.
00:14:01.840 In our command line options class, both 'valid' and 'value' expose their values through the 'ARGV' collection. We'll extract that logic to retrieve the raw value.
00:14:12.640 The methods left behind will focus on the desired result without the complexity of the 'how,' which will be detailed in the extracted method.
00:14:28.060 I will proceed with two more extractions, specifically the value method for string options and the content method. The naming of the extracted methods is essential—they clearly define their purpose.
00:14:48.400 However, I’m still unsatisfied with the code; it’s more explanatory but too complex and not as small as it could be. The methods are large due to missed abstractions.
00:15:01.800 Next, I’ll reference the option type symbol to see if it’s a string, which is a big warning sign that we're on the wrong path. There are also those magic constants used to retrieve content from within that string.
00:15:20.460 If I were confident that there wouldn't be any future requirements for this class, I might leave it alone. But then my colleague comes back and asks, 'Can we also handle integer options?'
00:15:36.580 To deal with this, I could continue down the undesigned path and complicate the 'valid' and 'value' methods by switching based on the option types. However, this is our chance to enhance our code to be more adaptable.
00:15:50.900 Let me demonstrate the impact of poor design. This prototype of undesigned code is not small, in my view—definitely not.
00:16:07.060 The class has expanded due to changes in specifications, and the 'valid' and 'value' methods evolve in tandem. This is a clear sign that we’ve missed an abstraction, causing those methods to grow complicated.
00:16:23.060 While all my tests pass, I feel dissatisfied. We have large methods and complex conditional logic; it's time to refactor to facilitate change easily.
00:16:46.400 I’d like to highlight a pattern emerging from non-OO design. Not reinventing the type system is crucial—if you have ducks, let them quack.
00:17:01.600 In this example, our option types are boolean, string, and integer. There are likely ducks in your code longing to be set free.
00:17:20.000 Just confirming we’re dealing with abstractions or ducks here—the testing of option types is hidden inside the 'valid' and 'value' methods, showing a case statement.
00:17:35.680 When your code involves case statements like these, it's a sign you’ve missed an abstraction. The moment I encountered string types, I should have embraced the OO path.
00:17:52.020 Sometimes it’s hard to recognize when to shift gears while writing code. It's time for a fresh perspective.
00:18:06.840 Let’s rethink what constitutes a good class, focusing on the principles that allow us to write small classes. First and foremost, ensure each class has a single responsibility.
00:18:21.680 Furthermore, all properties of a class should be cohesive to the abstraction that the class is representing. If properties are only referenced in one or two methods, that's likely an indicator they don't belong there.
00:18:40.460 Choosing an apt name helps maintain focus on a single responsibility. Sometimes I refer to this as talking to the rubber duck; explaining your problem—even to an inanimate object—can lead to clarity.
00:18:58.010 The main tools we’ll use to create new classes from existing code are extract class and move method refactorings.
00:19:07.050 Characteristics of a well-designed class include a single responsibility, cohesion around a defined set of properties, and a small public interface that handles a limited number of methods.
00:19:18.330 If possible, the primary logic should be expressed in a composed method, but this topic deserves its own discussion.
00:19:37.040 Now, let’s explore the code we should have aimed towards once string option types emerged. Imagine we have a clean slate to write command line options with the knowledge we have now.
00:19:55.260 We need to account for boolean, string, and integer options, and let’s ensure our tests remain intact to prevent any breakages.
00:20:14.910 Here’s my initial attempt at writing the class. It’s 28 lines long and cohesive around its properties, and I’ll leave it mostly as it is.
00:20:30.050 Most methods will manage a collection of option objects, and the sole responsibility is to oversee these.
00:20:46.200 I did introduce a collaborator that manufactures the option objects, which I could extract to another class.
00:20:59.900 But for now, I’ll leave it here. In general, I refactor when I feel the pain during changes, indicating a need for refinement.
00:21:16.020 The command line options class should have a small public interface, specifically two methods: valid and value, with no hard-coded external dependencies.
00:21:33.150 Also, this class doesn't contain any conditional statements, and that’s intentional. In Sandy Metz's 2009 talk on SOLID principles, she stated that conditionals in OO languages are often a sign of poor design.
00:21:51.820 I don’t think Sandy means conditionals should never be used. Rather, they can obscure abstractions in your code.
00:22:06.120 Initial methods for options carried over unchanged, now storing options in a hash instead of just the type.
00:22:24.430 The valid method simply checks all options to verify their validity, while the value method looks up values in the option hash.
00:22:41.410 Now we need to implement our options, which requires instantiating objects that represent the boolean, string, and integer options.
00:23:01.800 This creates a dependency. When introducing dependencies, we should aim for those that can accommodate future changes.
00:23:19.320 Instead of depending on concrete implementations, we need to align with abstractions. This applies even to duck types, where we rely on the concept rather than specific implementations.
00:23:35.220 The option is the abstraction here. The simple abstraction consists of having valid and value methods and a consistent initialization process across types.
00:23:49.890 I could go down the case statement road again and check the option type, instantiating the correct type based on the symbol.
00:24:00.020 However, I won't do that, as it would tie my command line class to the concrete types—something we want to avoid.
00:24:12.780 Creating a dependency on abstractions instead of concretions means using Ruby’s dynamic capabilities to instantiate those objects.
00:24:27.310 Using naming conventions, we can automatically instantiate the appropriate option class based on types like string, boolean, etc.
00:24:44.000 This adjustment shifts the command line option class from depending on concrete implementations to relying on abstractions.
00:25:01.400 This approach embodies dependency inversion from the SOLID principles. Some have suggested mapping symbols to concrete classes, but that complicates future modifications.
00:25:15.640 In my case, I’m comfortable using the dynamic capabilities of Ruby.
00:25:28.920 At this point, our command line options class is designed to accommodate new option types without requiring changes to its core structure.
00:25:42.860 To this end, we’ve constructed a clean hierarchy while ensuring our classes remain open to extension.
00:25:56.060 Next, we need to shift the logic for various option types to the appropriate option classes. I decided to create a base class from which the concrete option types would inherit.
00:26:09.080 This inheritance structure allows us to maintain uniformity in initialization without redundant code. Each subtype maintains cohesion around specific attributes.
00:26:25.660 For the boolean option, the requirements are so straightforward: a boolean is always valid, and the raw value is identified as true if present, otherwise false.
00:26:41.020 Now, we need to implement string and integer options, extracting validation and value extraction logic from the original command line options class.
00:26:57.060 On one side, we have the original command line options; on the other are the new string and integer option classes. We've successfully divided the logic appropriately.
00:27:13.220 By applying a mixture of extract class and move method refactorings, we’ve effectively streamlined the command line options class, leaving minimal code remaining.
00:27:30.020 Now, we can replace the complex valid and value methods with simpler versions that adhere to the principles we've discussed.
00:27:45.060 To validate the various option classes, I shifted the corresponding sections from the command line option spec to their new respective areas, updating them as necessary.
00:27:59.260 As I went through the process of extracting those classes and moving their code, we isolated specific abstractions.
00:28:14.790 We need to differentiate the 'what' from the 'how.' Our goal is to transition from code that looks excessively complex to more streamlined representations.
00:28:30.000 The original command line options 'valid' method while containing all of the 'how' now focuses on simply stating what needs to be done.
00:28:46.000 As a result, the 'how' has been relegated to the respective collaborators, specifically the string option, boolean option, and integer option classes.
00:29:02.020 Once we’ve completed the refactoring, the command line options class boasts a very small interface that effectively meets its use case.
00:29:14.950 None of the private implementation details require attention from outside the class. This delineation makes it clear which methods are for internal use.
00:29:29.080 Ultimately, the implementation fulfills our public interface and is all dedicated. As I worked through the spec process, I made adjustments until everything passed.
00:29:46.490 Yet again, this is a reminder that nothing is ever truly finished. I now hear, 'Any chance we could add the ability to pass an array of values for an option?'
00:30:02.710 To implement this requirement, I only need to create a new array option class. I’ll draft a spec to make it fail, then create the array option class, and that’s done.
00:30:17.490 In this approach, the option class inherits from the option superclass. As I worked through this, it became evident that strings, integers, and arrays all have content.
00:30:31.660 This realization led to the extraction of that superclass. For each option type, I just implemented the value method and we’re set.
00:30:45.830 Now, we have a command line options class that is closed for modifications but open for extensions. I can introduce float types or other new option types without additional changes.
00:30:58.620 Our small, comprehensible option classes have single responsibilities and are easy to compose together. We can simply create new option types to meet requirements.
00:31:06.880 My name is Mark Menard, president of Enable Labs. We handle full lifecycle business productivity and SAS app development, taking ideas from napkin sketches to production.
00:31:35.030 I'm here at the conference, so let’s gather and discuss code. I'm available for questions.