Talks
Refactoring from Good to Great
Summarized using AI

Refactoring from Good to Great

by Ben Orenstein

In the presentation titled "Refactoring from Good to Great," Ben Orenstein discusses advanced refactoring techniques aimed at transforming good code into great code, making it more readable and maintainable. This live coding session is part of the Aloha RubyConf 2012 event, where Orenstein emphasizes the collaborative nature of coding and encourages audience interaction.

Key Points Discussed:

- Refactoring Basics: The talk begins with a critique of common coding practices. Orenstein shares an example of a simple report code and identifies opportunities for improvement.

- Extracting Methods: Orenstein introduces a refactoring technique known as "extract temp to query," demonstrating how to refactor a temporary variable into its own method. This enhances code clarity and reusability.

- Tell, Don’t Ask Principle: He explains the "tell, don’t ask" principle, advocating for sending messages to objects rather than querying their internal state. This promotes better object-oriented design.

- Identifying Data Clumps: Orenstein highlights the "data clump" code smell and suggests creating a new data structure (date range) to encapsulate related parameters, thus improving code clarity and reducing coupling.

- Coupling and Abstractions: The importance of low coupling in software design is emphasized, with examples illustrating how reducing dependencies between components makes the code easier to change.

- Null Object Pattern: Orenstein introduces the Null Object Pattern as a way to eliminate conditionals regarding optional attributes (such as contacts) by creating an explicit NullContact class to handle cases where a contact may not exist.

- Importance of Abstractions: He discusses how properly using abstractions can simplify the codebase. The example of a PaymentGateway class illustrates how to encapsulate payment processor logic to avoid tightly coupling business logic with specific libraries.

- When to Refactor: Orenstein concludes by advising that refactoring should ideally occur when changes are needed rather than as a predetermined task, promoting a responsive approach to coding.

Conclusion:

Ben Orenstein's talk provides valuable insights into advanced refactoring strategies that enhance code quality. By advocating principles such as low coupling, clear abstractions, and effective use of the Null Object Pattern, he equips developers with tools to create more maintainable and understandable codebases. The session underscores collaborative programming and continuous improvement in coding practices through live demonstrations and interactive discussions.

00:00:15 Good afternoon, Aloha Ruby! How's it going, guys? Is everybody doing well? Hope you had a good lunch.
00:00:24 Raise your hand if you're in Hawaii right now. Not bad, right? Could be worse at a conference.
00:00:31 It's interesting; I realized earlier that this is the first time I've given a talk twice in a week.
00:00:37 On Thursday, I gave this exact same talk, or at least a very similar version of it, at Magic Ruby in Disney World.
00:00:43 So, I flew directly from Orlando to here. My life is rough, you can send sympathy cards to my address.
00:00:49 Before we get started, a little bit of administration. My name is Ben Orenstein, and I work at Thot in Boston.
00:01:01 The talk today is about refactoring from good to great.
00:01:07 It's basically about things I've learned in the last year or so that I believe improve the quality of my code.
00:01:12 So, I'm going to share that with you. I don't want you to think of this as a lecture.
00:01:19 This isn't me standing here and telling you all these things that are absolute truths.
00:01:24 Think of this as pairing. I'm now pair programming with all of you.
00:01:31 As a result, if you have questions, say something. If you disagree, say something.
00:01:37 If you have suggestions, say something. I love that back-and-forth during the talks.
00:01:43 Don't be shy about interrupting; I love diving into stuff.
00:01:50 So, let’s dive right into this. I want to show you my first example.
00:01:55 Take a second and read this code. Now, this is based on some code I wrote recently.
00:02:02 It's a simplified version. This is basically a very simple report that takes a collection of orders, a start date, and an end date, and returns the total sales of those orders.
00:02:09 An order has an attribute called 'placed_at' that we use to check the date range.
00:02:16 Now, about a year ago, I would have committed this, pushed it up, and been done with it.
00:02:23 Now, when I look at this, I think it's about a B to B-minus. It's decent.
00:02:29 If I were pretty confident that I would never come back to this code and change it or extend it, I’d probably just say, 'Whatever, good enough,' and push it up.
00:02:34 But these days, I actually see some things I would do to improve it.
00:02:41 The first thing I notice is that we’ve got this temp variable right here.
00:02:47 We’re figuring out the orders within the range, we’re storing them in a temp variable, and then we’re doing something with it.
00:02:52 I’d actually like to extract this into its own method. I'm going to pull this into a private method.
00:03:01 Now, I’ve recorded a macro that does this for me, so here’s the before.
00:03:07 The temp variable is in that method; here’s the after: I made a private method with the same name, and I'm just referencing it from up here.
00:03:14 This is something I actually do a lot these days – pull out these pieces.
00:03:20 There’s a refactoring name for this, called 'extract temp to query,' and this is something I do often.
00:03:26 Now, why do I do this? Why is this worth it? Well, the first thing to notice is we’ve gone from one method with two lines to two methods with one line each.
00:03:32 I'm not going to tell you that's always an improvement, but it usually is.
00:03:38 These days, I think methods longer than a line are a code smell.
00:03:45 Just in case anyone isn't familiar with the term, a code smell is something that indicates there may be a problem, not that there is a problem.
00:03:50 These days, I try really hard to get down to methods that are one line, and it usually results in really good code.
00:03:56 This is because my methods are focused, so that's one win.
00:04:02 The next thing is that we can reuse this.
00:04:07 I think it’s reasonably likely that the next thing the stakeholders will ask for is something like average sales within the date range.
00:04:13 Now, when the logic for selecting those orders within range is locked up in one method, I'm more likely to duplicate that later.
00:04:19 Whereas with this new structure, I think it's more likely that I'll reuse it.
00:04:26 So, I think that's another win.
00:04:32 Finally, the nice thing about this is when the code looks like this, it's easier to read.
00:04:39 When we see code that reads 'orders within range,' we can assume it selects all the orders that are within the date range.
00:04:46 The details become less crucial, and by extracting this method into a private method, I’m indicating that it’s not critical.
00:04:53 So, the wins may seem small, but in the aggregate, they’re valuable.
00:05:04 Let's keep moving; I think we can improve this further.
00:05:11 Now, inside the method we created called 'orders within range,' we’re asking the order about itself.
00:05:18 We’re saying, 'Hey, are you placed after the start date and before the end date?' and using that information to make a decision.
00:05:24 Has anybody heard of a concept called 'tell, don’t ask'?
00:05:30 Okay, cool. This idea, while not a law, can sometimes lead to better code.
00:05:37 What 'tell, don’t ask' states is that it's generally better to send an object a message and have it perform work.
00:05:43 Rather than asking that object about its internal state, we should send it a message and let it make decisions.
00:05:50 This code actually violates 'tell, don’t ask.' You can see I'm trying to select some orders by asking the order about itself.
00:05:56 I'd rather change this code to look differently.
00:06:01 How about instead we did something like this?
00:06:07 Now this sort of looks like an ask, but the logic has moved; I’m actually telling the order something.
00:06:14 I'm saying, 'Tell me if you’re placed between these two dates' instead of asking about its internal state.
00:06:20 This is closer to following 'tell, don’t ask.' I have specs for this, which I'm going to run now.
00:06:27 Uh, that blows up because I used local variables when I meant instance variables. Let’s do that again.
00:06:33 There’s the order. Here’s the error we want, which is 'undefined method placed_at.'
00:06:39 We’re effectively doing TDD by writing the code I wish were there.
00:06:46 So, let’s actually add that method to order now.
00:06:51 And we don’t actually want 'placed_at'; we want 'placed_between.' That’s a better name.
00:06:58 Alright, let’s define that. Place_between takes a start date and an end date.
00:07:05 Let’s say 'is my start after the start date' and 'is it before the end date.' Alright.
00:07:12 There’s our place_between method. Let’s run the test again. We’re back to green.
00:07:19 What about a range include? That’s a great idea, and I’m going to get there.
00:07:27 Yep, we’re taking little steps, but this guy is totally right: we will end up there eventually.
00:07:34 But for now, is this an improvement? Yes, I think it is.
00:07:41 We’ve moved this logic over to order, and we’ve stopped pulling out internal pieces of order and fiddling with them.
00:07:48 We’re just sending messages, which is a good principle to follow: don't let internal details leak out into the surrounding code.
00:07:55 That’s the core idea behind 'tell, don’t ask.' I’m much happier with where this logic is living.
00:08:02 But now we have a new problem that’s evident since we have a couple of places where I’m passing start date and end date.
00:08:11 Notice I passed start date and end date into the initialization of this report, and I’m doing it again down here.
00:08:19 This is another smell: when you have a pair of arguments that you're passing together all the time, it's called a data clump.
00:08:27 A data clump is a great hint that you should extract an object instead to hold those values.
00:08:35 One way to test if you have a data clump is to ask: if I took one away, would the remaining one make sense?
00:08:42 Not really! What would it mean to create an orders report with just an end date?
00:08:49 Doesn’t make sense, right? There's an implicit reliance on each other.
00:08:55 If I made these into one object, like a date range, it becomes explicit that I need both of those.
00:09:02 That's another great maxim: when you're writing your code, think about how to make it explicit.
00:09:09 Imagine you have to explain your code to someone else. Any line that makes you go, 'Well, you need this and that,' should be reconsidered.
00:09:15 Let’s do this: I’ll start in the spec.
00:09:22 Here’s my spec: it's really simple. I have an order within the range and one out of range.
00:09:28 Then, I create the date range and ensure that the total sales number is correct.
00:09:36 Now, I’m going to pull out a date range right now, and again, I'm going to write the code I wish I had.
00:09:42 Come on, baby!
00:09:47 Oh my gosh, I’m vimming too hard. There we go, and then here.
00:09:53 We’ll pass in the date range. Cool, so far.
00:09:58 We're going to run this, and it blows up because we have an uninitialized constant date range.
00:10:06 As expected, I just referenced a constant that doesn't exist.
00:10:16 Now, I’m going to create that.
00:10:22 Okay, now there’s going to be a handful of errors as I run this.
00:10:29 I'm passing a wrong number of arguments; this happens because I'm calling date range.new while passing in arguments.
00:10:36 But it doesn’t expect them, so let’s just make a quick struct.
00:10:40 To hold the start date and the end date.
00:10:47 Now I have roughly the error I want, which is a wrong number of arguments in orders report.
00:10:54 I’m passing two arguments right here, and it used to be expecting three up above.
00:11:01 So, I need to thread this new date range through, and I’m going to do this in a compressed series of steps.
00:11:08 So again, don't freak out. Let’s pass in the date range.
00:11:13 Now, there’s no such thing as start date and end date; it’s just the date range.
00:11:20 The same deal here in that method we wrote in order – it’s just the date range.
00:11:27 Now, we need to ask the date range about its start date and end date and see if we're green again.
00:11:34 Boom! Okay, so is this a win? Yes, it is!
00:11:40 I think this is worth doing.
00:11:47 Bob Martin believes that most intermediate object-oriented programmers are too reluctant to extract classes.
00:11:55 You should be fairly aggressive about being willing to extract small classes like this.
00:12:01 Look at date range! It’s super simple, right?
00:12:05 It’s very basic, but it’s made something that was implicit in our code explicit.
00:12:11 Now, date range is a pair of values. This class doesn’t even have any behavior on it, but I think it’s still worth doing.
00:12:18 I've created a name for something, and I’ve pulled out an explicit name.
00:12:25 That's almost always an improvement.
00:12:32 But there's another win here: we've reduced coupling!
00:12:39 Now what’s coupling? Coupling is the degree to which two components in a system rely on each other.
00:12:46 If component A and component B have zero coupling, you can change A however you want without ever breaking B, and vice versa.
00:12:54 Low coupling is good; high coupling is bad, because low coupling makes change easier.
00:13:01 It's hard to respond to change in software.
00:13:08 I saw a keynote by the author Thomas, and he said the only thing worth worrying about when you look at code is, 'Is it easy to change?'
00:13:14 Let’s look at some quick examples of coupling.
00:13:21 Different types of coupling exist. This is called parameter coupling.
00:13:28 Notice that 'notify_user_of_failure' passes in an object called failure into another method (print_to_console), where we call two_sentences on that parameter.
00:13:34 If I pass nil into print_to_console, this would blow up with a no-method error.
00:13:41 If I pass something inappropriate, that would cause an error as well.
00:13:47 There’s coupling between these two methods.
00:13:54 Notify_user_of_failure has to know what print_to_console is going to call on its parameter.
00:14:01 If I change print_to_console, I could break notify_user_of_failure.
00:14:06 Parameter coupling isn’t bad, but less coupling is almost always good.
00:14:13 Methods taking no arguments are superior to those taking one, and one-argument methods are better than those taking two.
00:14:20 That helps lower parameter coupling.
00:14:27 Notice what happened; we slimmed down our argument list from three to two. That’s a win!
00:14:34 Coupling is reduced; that’s one win.
00:14:41 We made something explicit that used to be implicit; that’s another win.
00:14:48 We also have a great place to hang behavior.
00:14:54 In object-oriented programming, it’s a great idea to group behavior with the data it operates on.
00:15:01 I can imagine wanting to know if other objects have a date placed between two endpoints.
00:15:08 We can now reuse the date range class.
00:15:15 So, I’d like to move this logic into date range.
00:15:22 Let’s ask date range, 'Does this include my placed_at?'
00:15:30 That looks good to me! But that blows up as expected because this method doesn't exist.
00:15:38 Let’s say, 'Is date after the start date and before the end date?'
00:15:45 Let's see if that works for us.
00:15:51 It does! Now order just knows how to tell something else if its placed_at is included.
00:15:59 I like this because order should know about placed_at, and I'm okay with date range knowing how to find if a date is between two dates.
00:16:05 This is exactly where I want this logic to live.
00:16:11 Now, let's address a little refactoring.
00:16:18 There’s actually a handier way of writing this. So I can ask if the start date and end date range include the date.
00:16:25 Back to green! Good on us!
00:16:31 I gave this talk somewhere, and Jose Valim showed me something: when you ask a range if it includes something, Ruby instantiates every object in that range.
00:16:40 This could be a performance issue. If the range is 300 years long, it could slow things down.
00:16:47 So, when you ask if it includes or cover, it only instantiates the two endpoints.
00:16:53 If we had a date range that was 300 years long, it's faster performance-wise.
00:17:01 Asks: Questions?
00:17:06 That’s a great point! My tests aren’t as thorough as I’d like.
00:17:13 I should go back and improve those tests, and beef up the coverage.
00:17:20 That’s where bugs live, where you can feel that discomfort.
00:17:27 Other questions?
00:17:34 I just wanted to mention the code smell motivating this is called feature envy.
00:17:41 Feature envy is when one class shows a concern with the internals of another class.
00:17:48 Usually, that’s a good hint to move logic into the class itself.
00:17:55 Alright, a couple quick things I would do now to wrap this example up before we move on.
00:18:01 This seems pretty ugly, right? I’m mapping the amount and doing the sum, but it sounds like I’m calculating total sales.
00:18:08 So whenever I see ugly code, I ask, 'How would I say this?'}
00:18:14 I really would say something like, 'Add up the total sales.'
00:18:20 Oh, where am I? Type of orders within range.
00:18:26 Back to green again!
00:18:33 Take a look at this top-level method, the total sales. The name is 'Total Sales Within Date Range'.
00:18:39 That is essentially all I want my code to say.
00:18:46 This is my only public method, and I'm particularly aggressive about how my public methods read.
00:18:53 So now let's move on to a new example.
00:19:00 Here's another bit of code I’d like you to look over.
00:19:07 Now, this code represents a job site. It’s a place where we’re doing some construction work.
00:19:14 All job sites have a location because you're always working somewhere.
00:19:20 Not every job site will have a contact; a contact could be nil, that's optional.
00:19:26 Notice I've just described something implicit about this code.
00:19:32 It’s not so obvious that a contact is optional, but you can start to see it at these ugly conditionals.
00:19:40 If I want to pull up the contact name, I must check for the presence of contact.
00:19:46 Likewise with contact phone; I must ensure it exists before I call a method.
00:19:51 This code is okay but could be improved.
00:20:00 We’re violating 'tell, don’t ask' again. Each time asking contact something makes it awkward.
00:20:07 What I'd rather do is just directly ask for the contact's name.
00:20:13 If it does exist, return what its name is; if not, say no name.
00:20:19 Now we're co-opting nil where it stands for no contact, which isn’t a good idea.
00:20:27 To prove this, let’s use a pattern called the Null Object Pattern.
00:20:35 Instead of nil, we’ll create an explicit object to represent no contact.
00:20:41 We'll call this object NullContact. So when we don’t have a contact, we assign NullContact.new.
00:20:48 Let's run our tests! Oops, got the wrong test going.
00:20:55 Things blow up because NullContact doesn't respond to some methods.
00:21:01 So, let's add basic responses like name, phone, and delivering personalized email.
00:21:08 If there’s no contact, I want name to return 'no name,' and deliver personalized email to do nothing.
00:21:16 Let’s run this method for phone.
00:21:23 Now we’re green!
00:21:30 For the cost of adding the simple NullContact class, I've eliminated three conditionals.
00:21:36 All while making methods clearer and more direct.
00:21:43 Notice—we're no longer violating 'tell, don’t ask' because we're always telling some contact.
00:21:49 We never actually care if it's an actual contact or a NullContact.
00:21:56 That’s why this is a great win!
00:22:02 This pattern is simple, but once you know it, opportunities for application emerge.
00:22:10 Now, there’s a downside.
00:22:17 Now I have to keep two APIs in sync.
00:22:23 If I add new methods to the contact class, I need to add them to NullContact as well.
00:22:31 That’s a pain, but generally it’s worth it.
00:22:37 A lot of code ends up with big hairy nasty conditionals, and NullContact destroys them.
00:22:43 This refactoring is known as 'replace conditional with polymorphism.'
00:22:50 Instead of using if statements, always send a message to different classes.
00:22:56 This helps keep your code cleaner—better to maintain.
00:23:02 You should weigh the costs: is it worth it given your current system?
00:23:09 We talked about good signs and bad signs.
00:23:17 But how many people have code where it’s like, 'If current user blah blah blah,' raise your hand!
00:23:24 Imagine removing the complexity of checking for a current user.
00:23:32 You can return a user or a NullUser.
00:23:38 Alright. Any questions about that?
00:23:44 Next example—take a second and read this code.
00:23:52 Let’s discuss the idea of depending upon abstractions.
00:23:59 Programmers are aware that it’s a good idea to depend upon abstractions.
00:24:07 Most people prefer using libraries over writing raw SQL or bytes.
00:24:13 But what a lot of developers don’t realize is that abstractions are fractal.
00:24:20 You should build abstractions aggressively within your application!
00:24:26 A good rule of thumb is that you want the whole to be simpler than the sum of its parts.
00:24:34 Take a few models or classes and wrap a class around them.
00:24:41 Make a simple API that’s easier to use than diving into the details.
00:24:48 There’s a common violation I see often.
00:24:55 Let’s have a look at this. This code is concerned with charging people for things.
00:25:03 BrainTree is a payment processor. I pull down the BrainTree gem—better than writing raw API calls.
00:25:10 Notice how user is concerned with brain tree ID, charging for subscriptions, creating customers, and refunding.
00:25:18 We're depending on the BrainTree gem as an abstraction.
00:25:23 But we could go deeper; currently, I need to change User if I want to change the gem.
00:25:30 That’s a signal for shotgun surgery.
00:25:37 If I change something, I should only change one file.
00:25:43 How do we fix this? I create a new class called PaymentGateway.
00:25:51 I move the constant equal to BrainTree into PaymentGateway.
00:25:58 I set up my Gateway, which by default is BrainTree. All that logic lives there.
00:26:04 My client code only knows about the PaymentGateway class.
00:26:10 If I want to change payment processors, only one file needs changing.
00:26:16 That’s the power of depending upon abstractions!
00:26:23 Now, I want to test this thoroughly.
00:26:29 If I stub methods on BrainTree, I will encounter issues upon updating.
00:26:36 In this version, I can stub my own methods; it will be much easier to manage.
00:26:42 If you control your methods during your tests, you’re going to have a good time.
00:26:49 When you notice concerns like billing spread throughout, it's a good time to re-architect.
00:26:56 No gem names should appear in your business logic.
00:27:03 You should only reference PaymentGateway, not the actual processor.
00:27:10 So, we've talked about several methods of refactoring. When do you refactor?
00:27:17 The best time to refactor is when changes need to be made.
00:27:23 Usually not on a random Monday morning, saying 'I’m going to refactor some code.'
Explore all talks recorded at Aloha RubyConf 2012
+13