Ruby Video | Refactoring from Good to Great

RubyConf AU 2013

Refactoring from Good to Great - A Live-Coding Odyssey

Ben Orenstein

@r00k

#refactoring

#clean-code

#object-oriented-programming-oop

Refactoring from Good to Great - A Live-Coding Odyssey

by Ben Orenstein

In the presentation "Refactoring from Good to Great," Ben Orenstein shares insights on elevating code quality through effective refactoring techniques. The talk, which is part of RubyConf AU 2013, emphasizes a live-coding approach without slides, allowing for real-time demonstration of code improvements. Orenstein begins by highlighting the importance of having tests in place before refactoring, as it serves as a safety net for ensuring code changes do not break functionality. The session aims to turn code that is merely 'good' into code that is 'great'—more readable, maintainable, and easier to change. Key points discussed include:

Extract Temp to Query: Orenstein demonstrates how to refactor a local variable into a private method, enhancing readability and encouraging method reuse.
Understanding Coupling: He explains different types of coupling and their implications, advocating for reducing parameter counts between methods to decrease complexity.
Data Clumps: He introduces the concept of data clumps, suggesting that tightly packed parameters in method calls may indicate a need for new classes or structures for better encapsulation.
Null Object Pattern and Decorator: Orenstein touches on advanced refactoring techniques, including the use of the Null Object pattern and the Decorator pattern for better code hygiene.
Vim Refactoring Techniques: The speaker shares his own experiences with Vim, discussing how custom macros can speed up the refactoring process.
Testing Smells: He covers common testing issues like 'Mystery Guest' and stubbing to improve test reliability.

Throughout the session, Orenstein encourages audience interaction, emphasizing that refactoring is an ongoing process that fosters better collaboration and communication within teams. He concludes by reinforcing the idea that regular refactoring, clarity in code, and mindful class structures lead to better software development practices, regardless of the specific methodology employed. Ultimately, participants are encouraged to keep iterating and improving their code, recognizing that every small change contributes to overall quality and maintainability.

00:00:05.480 Good morning! How's everybody doing? Excellent! My name is Ben Orenstein, and I work at Thot in Boston. This is actually the last time I'm going to be giving this talk, so you're seeing it in its most polished form. I'm excited to present it to you, but I'm also excited to retire it and write something new.

00:00:17.240 This session is titled 'Refactoring from Good to Great.' The point of this talk is not for me to reach the end; the point is for you to learn something. I’d like you to consider this a pairing session. I am now pairing with all of you: I'll drive, you navigate. If you get confused, please ask questions, and if you have a comment, please make it! I love interaction and want to hear from you. You're not interrupting; you're contributing. So please feel free to share your thoughts. Let's start by looking at some code. Take a second and read this.

00:01:17.119 Okay, so this is an orders report. It takes a collection of orders, a start date, and an end date, and it returns the total sales that are within that date range from those orders. There are specs associated with this code. I run them, and they look like this: they run fast because I'm not loading Rails. You'll see me running those tests again and again. Indeed, the first rule of refactoring is, 'Don't refactor code that doesn't have tests.' So, all the code that I’ll show you being refactored will have tests, and you'll see me running them continuously after every little change.

00:01:30.360 If I don’t run them, feel free to yell at me! Now, this code looks roughly like what I would have written about a year ago. Today, I would rate it about a B-. It's decent, but I think it can be better. There are some things that, during code review, I would point out and ask to be changed. The first issue happens on line 12, where we have a local variable called 'orders_within_range,' which we assign the results of performing some sort of 'select' over the collection of orders.

00:01:58.920 Now, I'm going to do a quick refactoring. There's a macro for this, so it will happen really fast. Don't freak out! I'm going to extract this as a temp variable and call this extraction method 'orders_within_range.' This is a known refactoring technique called 'Extract Temp to Query.' As a result, I now have a private method called 'orders_within_range' that was the result of that old 'select.' Here's the before state with the temp variable, and here's the after state, now a private method with the same name. I've moved that content down there.

00:02:36.720 This is an established refactoring, and I think it's already a decent improvement for a couple of reasons. First, we've gone from one method with two lines to two methods with one line each. I won't tell you that every single time, fewer lines are better in a method, but in general, that's probably true. I am deeply suspicious of methods longer than one line, and you should be too. So shorter methods: that's our first win.

00:03:03.519 The second win is that I've moved 'orders_within_range' beneath that private keyword. For some reason, as programmers, when we see code, we want to read it. If I’m skimming this method, the first thing I do is say, 'Okay, here's a local variable; what goes in that local variable?' and I would read the code. If I have it like this instead, I say, 'Oh, that's a method call; it's got a decent name. I assume those are the orders that are within the range passed in, and I don't need to dive into that method.' If I’m just trying to skim the file, I no longer need to be concerned about the details of how we get the orders within range.

00:03:30.879 Moreover, I've given everyone else who reads this file, including myself six months from now, a hint. The 'orders_within_range' calculation is not essential; it's not the core functionality for this class. This hint helps make this code more readable. So we have a shorter method, and we've made the code more readable. We've also encouraged reuse here. It's unlikely that we have a method called 'total_sales_within_date_range' without needing something like 'average_sales_within_date_range' or 'maximum_sales_within_date_range.' In every case, we're likely calculating all the orders that happen within a date range.

00:04:09.560 When the code looks like this, that functionality, or the logic for gathering all the orders within the range, is locked within this method. In this version, it's more likely that it’s easier to reuse this method in other methods within this class, allowing immediate reuse without needing to worry about those implementation details. Questions or thoughts on this so far?

00:05:02.880 What was that for me or the code? The 'orders_within_date_range' change? You're talking about changing this name right here? Yeah, definitely! I think it would be good to make them more consistent. That's a good point. We won't do that right now, but that's a good point.

00:05:26.759 The question was whether I would do this refactoring right away or wait until I see a need. I would do this right away because of the various advantages I've mentioned. If the only reason for doing this were reuse, I might take longer, but because of the benefits of readability, and that sort of hiding things beneath the private keyword, I am very likely to do this right away. I do this with almost all my temp variables now. I almost never have temp variables within a method; I usually extract them out to query methods that are private, and this has made my code significantly better.

00:06:04.720 Typically, methods should have one job, and whenever I see a local variable in a method, it's a hint that the method probably has two jobs. At least it's doing the job of calculating what goes into that local variable and another job to do something with that local.

00:06:40.560 So the concern is that if we make this method private, it makes it harder to test. I sort of agree and sort of don’t. This code was written test-driven, so by moving methods into private, I’m guaranteed that functionality is still tested. I haven't changed how the code works; I've just altered the visibility of part of this thing. Because I wrote this code with TDD, I’m not concerned about losing coverage. You touch on an interesting point, which is I used to prefer classes with tiny public interfaces and large private sets of methods, as that felt good to me.

00:07:04.800 I'm still fond of having a small public interface, but I'm beginning to think that if a class has too many private methods below the surface—some people call these iceberg classes—then there's a chance that another object is yearning to be created. Because eventually, you might be testing five, six, or seven methods through the public interface, and your tests might become ugly. You're testing something that's deeper within this file by coming in through this method. I've started looking at this as a smell; too many private methods indicate that another class likely wants to exist.

00:07:46.080 One example you've given is fair. You mentioned that the private method uses all three of our instance variables, which might make it hard to extract. While that's true, I'm not worried about that right now. Not every private method needs to be extracted. If I find the need to do so later, I could address that issue.

00:08:12.560 Now, your first concern raised the question of whether, if I use 'orders_within_range' twice, am I recalibrating it twice, and is that a performance issue? You're absolutely correct! As soon as you show me performance logs indicating that this is a bottleneck, I'll refactor it to cache. Anything else is premature optimization. Programmers love—or Rubists love—these 'or' statements, and we don't want to recalculate; hence we say, just cache it. However, caching can occasionally bite you; it can lead to unexpected outcomes with performance issues.

00:08:57.040 So I resist doing 'or' statements and caching unless I've seen profiling, performance data, or real numbers showing that this thing takes too long to execute; thus, we need caching. I strongly advocate for that and push back against it in code reviews.

00:09:29.480 The question is whether extracting private methods from one class leads to tight coupling, and therefore the new class will only be used by the old class. The answer is that I’m more likely to extract private methods into another class if I think it might be reused within the system. If I think it's something that would only ever be used in this class, I might even extract a private class simply to label a concept and keep it private to this class, encapsulating it.

00:10:09.440 However, if I think something might be reused multiple times, I'm very open to pulling it into a new class. I’m not worried about the coupling of Class A and Class B. Classes need dependencies on one another. As long as it’s not highly specific to one class, I’m okay with Class A depending on Class B.

00:10:43.199 The concern was raised about whether if I extract private methods, doesn't that imply another class exists? That's a fair point. You could create a class and then pull the methods into that class before making it private to this parent class. If it turns out some other class wants to use it, I’ll promote it to a public class and give it its own set of tests.

00:11:19.240 There's a concept called fan out and fan in. I always mix them up, but the idea is that high fan-in means many classes use your component. That is a good thing; it indicates reusable components. High fan-out, however, is not ideal; it means your class uses many other classes, which often indicates confusion. It’s difficult to maintain if you need to package and coordinate the behavior of numerous different things. While it isn't a perfect metric, it's worth considering.

00:11:56.840 The question was whether the refactoring process was done in Vim, and if I could share what I used to refactor. I cheated; it's a custom macro, and I will show it to you. The macros in Vim are literal recordings of the keystrokes entered to make something happen. This isn’t a generalized solution; it's a one-off solution designed to impress people during talks.

00:12:27.760 I have experimented with several Vim refactoring plugins, hoping someone might generalize them, but nothing has worked consistently well. Some plugins work decently at times, so it’s worth experimenting because I believe there's time to be saved there. However, I haven't actually found anything that has been consistently great; I guess that’s part of the price we pay for having dynamic typing.

00:12:59.480 The question you raised is whether we should select the orders directly from the database since we have start and end dates available. This query is indeed interesting. I often prefer my classes to be context-free, meaning the class doesn’t know where the orders originate; it only knows it has a collection of something that responds to certain methods. By having the start and end date used to pull information from the database with specific class names and structures, this class is now aware of where the orders come from, which I prefer to avoid.

00:13:39.840 Additionally, this way, the tests are easier to write. If I make it depend on the database operations, I’ll either need to stub the data and database calls or input real data into the database, which I don't enjoy doing. Instead, I'd rather pass stubs or mock items, making the tests faster and easier to read, thereby improving the overall quality of the code. Are there any more questions before we move on?

00:14:36.280 If this were the last thing I were doing with this file, I would commit it. However, we are not there yet, and there's still more to improve. So let's identify what else we can enhance. Check out this 'orders_within_range' method; there’s something here that sets off alarm bells. The 'select' method takes a block, and that block contains some logic for calculating something.

00:15:04.920 I've been aggressive about extracting things from conditionals and blocks into their respective functions. This improvement generally makes code significantly easier to read. I notice that conditionals often become complicated, especially when I see 'and' and 'or' statements. So this is a signal to extract this logic into a named process instead. What I propose is something like 'ask the order if it's placed between.'

00:15:53.480 Now I have the old code, where I would typically just use the start date and end date. But instead, let’s assign this responsibility to the order and create a function that checks if the order is placed between those dates. This would enhance our encapsulation. Let's check if this adjustment works. The next test will fail because the method 'placed_between' isn't defined yet. So let’s write that method incrementally. I anticipate that the implementation might throw an error due to the number of arguments; let's address that first.

00:16:35.360 In response to undefined parameters, we can adjust our 'placed_between' function with the correct number of parameters, namely the start date and end date. Running the tests, everything seems to be functioning well. The responsibility for determining the question 'Is this order placed between these two dates?' has now shifted to the order itself. This adjustment complies with the principle in object-oriented programming that dictates calculations should be associated with the data they pertain to.

00:17:06.640 By doing this, we recognize there's a certain code smell that arises from the parameters being passed around in a repetitive manner. Notice, we are passing a start date and end date all throughout the code. This repetitive coupling indicates there might be a data clump. A data clump refers to parameters frequently used together, highlighting the potential existence of a new object that could encapsulate these parameters.

00:18:24.480 To address the data clump smell, we’ll create a separate date range class to manage the start and end dates. I'm going to establish a new class named 'DateRange.' I’d like this refactoring to occur swiftly. Therefore, my first step is to instantiate this new date range within the 'orders_report' class. If everything works correctly, I’ll pass that through our functions incrementally, ensuring we quickly return to a green state on our tests at each step.

00:19:12.439 Now, we have our initial setup where our code should return to green. I avoid making too many changes at once to keep things manageable and traceable. All things considered, using this date range will still return the expected results while simplifying how parameters are passed around.

00:20:02.960 I've now moved from passing three arguments to only two, which represents a win for our code because of what's known as parameter coupling. Coupling is about how components in a system rely on each other. If a class's methods frequently expect the same kind of parameters, we recognize that there are complications arising.

00:20:39.919 This means should the parameters change, those modifications will ripple through the codebase, breaking other methods. So if we can reduce the parameter count in our classes, we reduce the coupling and improve the maintainability of the system.

00:21:26.399 The question that arises is whether we could eliminate the 'placed_at' method after creating our 'placed_between' function. This is a good question. Now, this change could lead to unintended consequences; thus, we must ensure that we adequately encapsulate functionality while minimizing the number of public methods residing in our classes.

00:22:06.600 The interaction where one class can call methods on another is often indicative of unnecessary coupling; hence we should investigate our class structure. Our goal is to ensure each class has a consistent, singular responsibility. Therefore, enhancing our 'Order' class with this method provides the necessary responsibilities while still keeping encapsulation clean.

00:22:52.080 As a small practical adjustment, let’s consider the code structure and naming conventions utilized here. The code we discuss should be readable at a glance, offering intuitive understanding of function and purpose. As we build the method 'total_sales_within_date_range,' we focus on easing its readability and maintainability while resulting calculations become more seamless.

00:23:33.520 While I know we are running short on time, I want to iterate on further concepts from the 'orders_report' class regarding clarity and cohesiveness of presentations. I can see potential areas of improvement without compromising the integrity of the core functionality. The relationship between classes and the shared responsibility of their interactions can open avenues for more organized and direct code structures.

00:24:17.559 The essence is to reduce dependency clusters in the code while fostering a deeper understanding of class interactions. Continually returning to clean and sensible naming schemes is critical, as is maintaining clarity in both public interfaces and executable pieces. Hence, let's examine if there are remnants in our recent revisions that can further emphasize the integrity of the structure.

00:25:30.040 As we inject the 'orders_within_range' logic directly into the new DateRange class we begin reducing redundancy throughout the code, making it ever more maintainable. The responsibilities now clearly defined also lend us to think about how we structure all future classes involved with this processing. External inspection should occur periodically, ensuring we stay aligned with a vision of simplicity.

00:26:17.760 As we pin down our method, including the proposed classes, it's important to explicitly follow through with our methodology. The way we decompose functions and the rationale behind those choices will play a critical role in the functionality we choose to introduce or improve down the line.

00:27:05.960 I encourage you all to remember that as we refine structures and languages, we should be cognizant of the impact our methodologies and idiosyncratic tendencies can foster within a team setting. There is no singular path here, but ... building as we go is a fundamental part of the process.

00:27:54.160 To summarize the takeaways from our journey in this session include the joint responsibility for clarity in code, the benefits of refactoring regularly to keep everything functioning well, and refining the approach to classes with an eye towards parameter coupling and dependency management.

00:28:38.480 As we conclude our time here today—as we dive deeper into improvements within our respective practices—remember that whether you choose incremental refactoring or root-to-branch revisions, the value gained will kindle ongoing thought regarding how we breathe life into our creations through meticulous processes.

00:30:19.240 I appreciate your thoughtful engagement throughout. If anyone has any additional thoughts, questions, or insights they want to share, please don't hesitate to reach out and chat!

00:30:54.640 Thank you again, and I’m looking forward to the next time we can connect. Keep iterating, and remember, there’s always room for improvement! Every new learning leads to better code, clearer communication, and stronger collaboration. Thank you!

RubyConf AU 2013