00:00:05.480
Good morning! How's everybody doing? Excellent! My name is Ben Orenstein, and I work at Thot in Boston. This is actually the last time I'm going to be giving this talk, so you're seeing it in its most polished form. I'm excited to present it to you, but I'm also excited to retire it and write something new.
00:00:17.240
This session is titled 'Refactoring from Good to Great.' The point of this talk is not for me to reach the end; the point is for you to learn something. I’d like you to consider this a pairing session. I am now pairing with all of you: I'll drive, you navigate. If you get confused, please ask questions, and if you have a comment, please make it! I love interaction and want to hear from you. You're not interrupting; you're contributing. So please feel free to share your thoughts. Let's start by looking at some code. Take a second and read this.
00:01:17.119
Okay, so this is an orders report. It takes a collection of orders, a start date, and an end date, and it returns the total sales that are within that date range from those orders. There are specs associated with this code. I run them, and they look like this: they run fast because I'm not loading Rails. You'll see me running those tests again and again. Indeed, the first rule of refactoring is, 'Don't refactor code that doesn't have tests.' So, all the code that I’ll show you being refactored will have tests, and you'll see me running them continuously after every little change.
00:01:30.360
If I don’t run them, feel free to yell at me! Now, this code looks roughly like what I would have written about a year ago. Today, I would rate it about a B-. It's decent, but I think it can be better. There are some things that, during code review, I would point out and ask to be changed. The first issue happens on line 12, where we have a local variable called 'orders_within_range,' which we assign the results of performing some sort of 'select' over the collection of orders.
00:01:58.920
Now, I'm going to do a quick refactoring. There's a macro for this, so it will happen really fast. Don't freak out! I'm going to extract this as a temp variable and call this extraction method 'orders_within_range.' This is a known refactoring technique called 'Extract Temp to Query.' As a result, I now have a private method called 'orders_within_range' that was the result of that old 'select.' Here's the before state with the temp variable, and here's the after state, now a private method with the same name. I've moved that content down there.
00:02:36.720
This is an established refactoring, and I think it's already a decent improvement for a couple of reasons. First, we've gone from one method with two lines to two methods with one line each. I won't tell you that every single time, fewer lines are better in a method, but in general, that's probably true. I am deeply suspicious of methods longer than one line, and you should be too. So shorter methods: that's our first win.
00:03:03.519
The second win is that I've moved 'orders_within_range' beneath that private keyword. For some reason, as programmers, when we see code, we want to read it. If I’m skimming this method, the first thing I do is say, 'Okay, here's a local variable; what goes in that local variable?' and I would read the code. If I have it like this instead, I say, 'Oh, that's a method call; it's got a decent name. I assume those are the orders that are within the range passed in, and I don't need to dive into that method.' If I’m just trying to skim the file, I no longer need to be concerned about the details of how we get the orders within range.
00:03:30.879
Moreover, I've given everyone else who reads this file, including myself six months from now, a hint. The 'orders_within_range' calculation is not essential; it's not the core functionality for this class. This hint helps make this code more readable. So we have a shorter method, and we've made the code more readable. We've also encouraged reuse here. It's unlikely that we have a method called 'total_sales_within_date_range' without needing something like 'average_sales_within_date_range' or 'maximum_sales_within_date_range.' In every case, we're likely calculating all the orders that happen within a date range.
00:04:09.560
When the code looks like this, that functionality, or the logic for gathering all the orders within the range, is locked within this method. In this version, it's more likely that it’s easier to reuse this method in other methods within this class, allowing immediate reuse without needing to worry about those implementation details. Questions or thoughts on this so far?
00:05:02.880
What was that for me or the code? The 'orders_within_date_range' change? You're talking about changing this name right here? Yeah, definitely! I think it would be good to make them more consistent. That's a good point. We won't do that right now, but that's a good point.
00:05:26.759
The question was whether I would do this refactoring right away or wait until I see a need. I would do this right away because of the various advantages I've mentioned. If the only reason for doing this were reuse, I might take longer, but because of the benefits of readability, and that sort of hiding things beneath the private keyword, I am very likely to do this right away. I do this with almost all my temp variables now. I almost never have temp variables within a method; I usually extract them out to query methods that are private, and this has made my code significantly better.
00:06:04.720
Typically, methods should have one job, and whenever I see a local variable in a method, it's a hint that the method probably has two jobs. At least it's doing the job of calculating what goes into that local variable and another job to do something with that local.
00:06:40.560
So the concern is that if we make this method private, it makes it harder to test. I sort of agree and sort of don’t. This code was written test-driven, so by moving methods into private, I’m guaranteed that functionality is still tested. I haven't changed how the code works; I've just altered the visibility of part of this thing. Because I wrote this code with TDD, I’m not concerned about losing coverage. You touch on an interesting point, which is I used to prefer classes with tiny public interfaces and large private sets of methods, as that felt good to me.
00:07:04.800
I'm still fond of having a small public interface, but I'm beginning to think that if a class has too many private methods below the surface—some people call these iceberg classes—then there's a chance that another object is yearning to be created. Because eventually, you might be testing five, six, or seven methods through the public interface, and your tests might become ugly. You're testing something that's deeper within this file by coming in through this method. I've started looking at this as a smell; too many private methods indicate that another class likely wants to exist.
00:07:46.080
One example you've given is fair. You mentioned that the private method uses all three of our instance variables, which might make it hard to extract. While that's true, I'm not worried about that right now. Not every private method needs to be extracted. If I find the need to do so later, I could address that issue.
00:08:12.560
Now, your first concern raised the question of whether, if I use 'orders_within_range' twice, am I recalibrating it twice, and is that a performance issue? You're absolutely correct! As soon as you show me performance logs indicating that this is a bottleneck, I'll refactor it to cache. Anything else is premature optimization. Programmers love—or Rubists love—these 'or' statements, and we don't want to recalculate; hence we say, just cache it. However, caching can occasionally bite you; it can lead to unexpected outcomes with performance issues.
00:08:57.040
So I resist doing 'or' statements and caching unless I've seen profiling, performance data, or real numbers showing that this thing takes too long to execute; thus, we need caching. I strongly advocate for that and push back against it in code reviews.
00:09:29.480
The question is whether extracting private methods from one class leads to tight coupling, and therefore the new class will only be used by the old class. The answer is that I’m more likely to extract private methods into another class if I think it might be reused within the system. If I think it's something that would only ever be used in this class, I might even extract a private class simply to label a concept and keep it private to this class, encapsulating it.
00:10:09.440
However, if I think something might be reused multiple times, I'm very open to pulling it into a new class. I’m not worried about the coupling of Class A and Class B. Classes need dependencies on one another. As long as it’s not highly specific to one class, I’m okay with Class A depending on Class B.
00:10:43.199
The concern was raised about whether if I extract private methods, doesn't that imply another class exists? That's a fair point. You could create a class and then pull the methods into that class before making it private to this parent class. If it turns out some other class wants to use it, I’ll promote it to a public class and give it its own set of tests.
00:11:19.240
There's a concept called fan out and fan in. I always mix them up, but the idea is that high fan-in means many classes use your component. That is a good thing; it indicates reusable components. High fan-out, however, is not ideal; it means your class uses many other classes, which often indicates confusion. It’s difficult to maintain if you need to package and coordinate the behavior of numerous different things. While it isn't a perfect metric, it's worth considering.
00:11:56.840
The question was whether the refactoring process was done in Vim, and if I could share what I used to refactor. I cheated; it's a custom macro, and I will show it to you. The macros in Vim are literal recordings of the keystrokes entered to make something happen. This isn’t a generalized solution; it's a one-off solution designed to impress people during talks.
00:12:27.760
I have experimented with several Vim refactoring plugins, hoping someone might generalize them, but nothing has worked consistently well. Some plugins work decently at times, so it’s worth experimenting because I believe there's time to be saved there. However, I haven't actually found anything that has been consistently great; I guess that’s part of the price we pay for having dynamic typing.
00:12:59.480
The question you raised is whether we should select the orders directly from the database since we have start and end dates available. This query is indeed interesting. I often prefer my classes to be context-free, meaning the class doesn’t know where the orders originate; it only knows it has a collection of something that responds to certain methods. By having the start and end date used to pull information from the database with specific class names and structures, this class is now aware of where the orders come from, which I prefer to avoid.
00:13:39.840
Additionally, this way, the tests are easier to write. If I make it depend on the database operations, I’ll either need to stub the data and database calls or input real data into the database, which I don't enjoy doing. Instead, I'd rather pass stubs or mock items, making the tests faster and easier to read, thereby improving the overall quality of the code. Are there any more questions before we move on?
00:14:36.280
If this were the last thing I were doing with this file, I would commit it. However, we are not there yet, and there's still more to improve. So let's identify what else we can enhance. Check out this 'orders_within_range' method; there’s something here that sets off alarm bells. The 'select' method takes a block, and that block contains some logic for calculating something.
00:15:04.920
I've been aggressive about extracting things from conditionals and blocks into their respective functions. This improvement generally makes code significantly easier to read. I notice that conditionals often become complicated, especially when I see 'and' and 'or' statements. So this is a signal to extract this logic into a named process instead. What I propose is something like 'ask the order if it's placed between.'
00:15:53.480
Now I have the old code, where I would typically just use the start date and end date. But instead, let’s assign this responsibility to the order and create a function that checks if the order is placed between those dates. This would enhance our encapsulation. Let's check if this adjustment works. The next test will fail because the method 'placed_between' isn't defined yet. So let’s write that method incrementally. I anticipate that the implementation might throw an error due to the number of arguments; let's address that first.
00:16:35.360
In response to undefined parameters, we can adjust our 'placed_between' function with the correct number of parameters, namely the start date and end date. Running the tests, everything seems to be functioning well. The responsibility for determining the question 'Is this order placed between these two dates?' has now shifted to the order itself. This adjustment complies with the principle in object-oriented programming that dictates calculations should be associated with the data they pertain to.
00:17:06.640
By doing this, we recognize there's a certain code smell that arises from the parameters being passed around in a repetitive manner. Notice, we are passing a start date and end date all throughout the code. This repetitive coupling indicates there might be a data clump. A data clump refers to parameters frequently used together, highlighting the potential existence of a new object that could encapsulate these parameters.
00:18:24.480
To address the data clump smell, we’ll create a separate date range class to manage the start and end dates. I'm going to establish a new class named 'DateRange.' I’d like this refactoring to occur swiftly. Therefore, my first step is to instantiate this new date range within the 'orders_report' class. If everything works correctly, I’ll pass that through our functions incrementally, ensuring we quickly return to a green state on our tests at each step.
00:19:12.439
Now, we have our initial setup where our code should return to green. I avoid making too many changes at once to keep things manageable and traceable. All things considered, using this date range will still return the expected results while simplifying how parameters are passed around.
00:20:02.960
I've now moved from passing three arguments to only two, which represents a win for our code because of what's known as parameter coupling. Coupling is about how components in a system rely on each other. If a class's methods frequently expect the same kind of parameters, we recognize that there are complications arising.
00:20:39.919
This means should the parameters change, those modifications will ripple through the codebase, breaking other methods. So if we can reduce the parameter count in our classes, we reduce the coupling and improve the maintainability of the system.
00:21:26.399
The question that arises is whether we could eliminate the 'placed_at' method after creating our 'placed_between' function. This is a good question. Now, this change could lead to unintended consequences; thus, we must ensure that we adequately encapsulate functionality while minimizing the number of public methods residing in our classes.
00:22:06.600
The interaction where one class can call methods on another is often indicative of unnecessary coupling; hence we should investigate our class structure. Our goal is to ensure each class has a consistent, singular responsibility. Therefore, enhancing our 'Order' class with this method provides the necessary responsibilities while still keeping encapsulation clean.
00:22:52.080
As a small practical adjustment, let’s consider the code structure and naming conventions utilized here. The code we discuss should be readable at a glance, offering intuitive understanding of function and purpose. As we build the method 'total_sales_within_date_range,' we focus on easing its readability and maintainability while resulting calculations become more seamless.
00:23:33.520
While I know we are running short on time, I want to iterate on further concepts from the 'orders_report' class regarding clarity and cohesiveness of presentations. I can see potential areas of improvement without compromising the integrity of the core functionality. The relationship between classes and the shared responsibility of their interactions can open avenues for more organized and direct code structures.
00:24:17.559
The essence is to reduce dependency clusters in the code while fostering a deeper understanding of class interactions. Continually returning to clean and sensible naming schemes is critical, as is maintaining clarity in both public interfaces and executable pieces. Hence, let's examine if there are remnants in our recent revisions that can further emphasize the integrity of the structure.
00:25:30.040
As we inject the 'orders_within_range' logic directly into the new DateRange class we begin reducing redundancy throughout the code, making it ever more maintainable. The responsibilities now clearly defined also lend us to think about how we structure all future classes involved with this processing. External inspection should occur periodically, ensuring we stay aligned with a vision of simplicity.
00:26:17.760
As we pin down our method, including the proposed classes, it's important to explicitly follow through with our methodology. The way we decompose functions and the rationale behind those choices will play a critical role in the functionality we choose to introduce or improve down the line.
00:27:05.960
I encourage you all to remember that as we refine structures and languages, we should be cognizant of the impact our methodologies and idiosyncratic tendencies can foster within a team setting. There is no singular path here, but ... building as we go is a fundamental part of the process.
00:27:54.160
To summarize the takeaways from our journey in this session include the joint responsibility for clarity in code, the benefits of refactoring regularly to keep everything functioning well, and refining the approach to classes with an eye towards parameter coupling and dependency management.
00:28:38.480
As we conclude our time here today—as we dive deeper into improvements within our respective practices—remember that whether you choose incremental refactoring or root-to-branch revisions, the value gained will kindle ongoing thought regarding how we breathe life into our creations through meticulous processes.
00:30:19.240
I appreciate your thoughtful engagement throughout. If anyone has any additional thoughts, questions, or insights they want to share, please don't hesitate to reach out and chat!
00:30:54.640
Thank you again, and I’m looking forward to the next time we can connect. Keep iterating, and remember, there’s always room for improvement! Every new learning leads to better code, clearer communication, and stronger collaboration. Thank you!