00:00:15
Good afternoon, Aloha Ruby! How's it going, guys? Is everybody doing well? Hope you had a good lunch.
00:00:24
Raise your hand if you're in Hawaii right now. Not bad, right? Could be worse at a conference.
00:00:31
It's interesting; I realized earlier that this is the first time I've given a talk twice in a week.
00:00:37
On Thursday, I gave this exact same talk, or at least a very similar version of it, at Magic Ruby in Disney World.
00:00:43
So, I flew directly from Orlando to here. My life is rough, you can send sympathy cards to my address.
00:00:49
Before we get started, a little bit of administration. My name is Ben Orenstein, and I work at Thot in Boston.
00:01:01
The talk today is about refactoring from good to great.
00:01:07
It's basically about things I've learned in the last year or so that I believe improve the quality of my code.
00:01:12
So, I'm going to share that with you. I don't want you to think of this as a lecture.
00:01:19
This isn't me standing here and telling you all these things that are absolute truths.
00:01:24
Think of this as pairing. I'm now pair programming with all of you.
00:01:31
As a result, if you have questions, say something. If you disagree, say something.
00:01:37
If you have suggestions, say something. I love that back-and-forth during the talks.
00:01:43
Don't be shy about interrupting; I love diving into stuff.
00:01:50
So, let’s dive right into this. I want to show you my first example.
00:01:55
Take a second and read this code. Now, this is based on some code I wrote recently.
00:02:02
It's a simplified version. This is basically a very simple report that takes a collection of orders, a start date, and an end date, and returns the total sales of those orders.
00:02:09
An order has an attribute called 'placed_at' that we use to check the date range.
00:02:16
Now, about a year ago, I would have committed this, pushed it up, and been done with it.
00:02:23
Now, when I look at this, I think it's about a B to B-minus. It's decent.
00:02:29
If I were pretty confident that I would never come back to this code and change it or extend it, I’d probably just say, 'Whatever, good enough,' and push it up.
00:02:34
But these days, I actually see some things I would do to improve it.
00:02:41
The first thing I notice is that we’ve got this temp variable right here.
00:02:47
We’re figuring out the orders within the range, we’re storing them in a temp variable, and then we’re doing something with it.
00:02:52
I’d actually like to extract this into its own method. I'm going to pull this into a private method.
00:03:01
Now, I’ve recorded a macro that does this for me, so here’s the before.
00:03:07
The temp variable is in that method; here’s the after: I made a private method with the same name, and I'm just referencing it from up here.
00:03:14
This is something I actually do a lot these days – pull out these pieces.
00:03:20
There’s a refactoring name for this, called 'extract temp to query,' and this is something I do often.
00:03:26
Now, why do I do this? Why is this worth it? Well, the first thing to notice is we’ve gone from one method with two lines to two methods with one line each.
00:03:32
I'm not going to tell you that's always an improvement, but it usually is.
00:03:38
These days, I think methods longer than a line are a code smell.
00:03:45
Just in case anyone isn't familiar with the term, a code smell is something that indicates there may be a problem, not that there is a problem.
00:03:50
These days, I try really hard to get down to methods that are one line, and it usually results in really good code.
00:03:56
This is because my methods are focused, so that's one win.
00:04:02
The next thing is that we can reuse this.
00:04:07
I think it’s reasonably likely that the next thing the stakeholders will ask for is something like average sales within the date range.
00:04:13
Now, when the logic for selecting those orders within range is locked up in one method, I'm more likely to duplicate that later.
00:04:19
Whereas with this new structure, I think it's more likely that I'll reuse it.
00:04:26
So, I think that's another win.
00:04:32
Finally, the nice thing about this is when the code looks like this, it's easier to read.
00:04:39
When we see code that reads 'orders within range,' we can assume it selects all the orders that are within the date range.
00:04:46
The details become less crucial, and by extracting this method into a private method, I’m indicating that it’s not critical.
00:04:53
So, the wins may seem small, but in the aggregate, they’re valuable.
00:05:04
Let's keep moving; I think we can improve this further.
00:05:11
Now, inside the method we created called 'orders within range,' we’re asking the order about itself.
00:05:18
We’re saying, 'Hey, are you placed after the start date and before the end date?' and using that information to make a decision.
00:05:24
Has anybody heard of a concept called 'tell, don’t ask'?
00:05:30
Okay, cool. This idea, while not a law, can sometimes lead to better code.
00:05:37
What 'tell, don’t ask' states is that it's generally better to send an object a message and have it perform work.
00:05:43
Rather than asking that object about its internal state, we should send it a message and let it make decisions.
00:05:50
This code actually violates 'tell, don’t ask.' You can see I'm trying to select some orders by asking the order about itself.
00:05:56
I'd rather change this code to look differently.
00:06:01
How about instead we did something like this?
00:06:07
Now this sort of looks like an ask, but the logic has moved; I’m actually telling the order something.
00:06:14
I'm saying, 'Tell me if you’re placed between these two dates' instead of asking about its internal state.
00:06:20
This is closer to following 'tell, don’t ask.' I have specs for this, which I'm going to run now.
00:06:27
Uh, that blows up because I used local variables when I meant instance variables. Let’s do that again.
00:06:33
There’s the order. Here’s the error we want, which is 'undefined method placed_at.'
00:06:39
We’re effectively doing TDD by writing the code I wish were there.
00:06:46
So, let’s actually add that method to order now.
00:06:51
And we don’t actually want 'placed_at'; we want 'placed_between.' That’s a better name.
00:06:58
Alright, let’s define that. Place_between takes a start date and an end date.
00:07:05
Let’s say 'is my start after the start date' and 'is it before the end date.' Alright.
00:07:12
There’s our place_between method. Let’s run the test again. We’re back to green.
00:07:19
What about a range include? That’s a great idea, and I’m going to get there.
00:07:27
Yep, we’re taking little steps, but this guy is totally right: we will end up there eventually.
00:07:34
But for now, is this an improvement? Yes, I think it is.
00:07:41
We’ve moved this logic over to order, and we’ve stopped pulling out internal pieces of order and fiddling with them.
00:07:48
We’re just sending messages, which is a good principle to follow: don't let internal details leak out into the surrounding code.
00:07:55
That’s the core idea behind 'tell, don’t ask.' I’m much happier with where this logic is living.
00:08:02
But now we have a new problem that’s evident since we have a couple of places where I’m passing start date and end date.
00:08:11
Notice I passed start date and end date into the initialization of this report, and I’m doing it again down here.
00:08:19
This is another smell: when you have a pair of arguments that you're passing together all the time, it's called a data clump.
00:08:27
A data clump is a great hint that you should extract an object instead to hold those values.
00:08:35
One way to test if you have a data clump is to ask: if I took one away, would the remaining one make sense?
00:08:42
Not really! What would it mean to create an orders report with just an end date?
00:08:49
Doesn’t make sense, right? There's an implicit reliance on each other.
00:08:55
If I made these into one object, like a date range, it becomes explicit that I need both of those.
00:09:02
That's another great maxim: when you're writing your code, think about how to make it explicit.
00:09:09
Imagine you have to explain your code to someone else. Any line that makes you go, 'Well, you need this and that,' should be reconsidered.
00:09:15
Let’s do this: I’ll start in the spec.
00:09:22
Here’s my spec: it's really simple. I have an order within the range and one out of range.
00:09:28
Then, I create the date range and ensure that the total sales number is correct.
00:09:36
Now, I’m going to pull out a date range right now, and again, I'm going to write the code I wish I had.
00:09:42
Come on, baby!
00:09:47
Oh my gosh, I’m vimming too hard. There we go, and then here.
00:09:53
We’ll pass in the date range. Cool, so far.
00:09:58
We're going to run this, and it blows up because we have an uninitialized constant date range.
00:10:06
As expected, I just referenced a constant that doesn't exist.
00:10:16
Now, I’m going to create that.
00:10:22
Okay, now there’s going to be a handful of errors as I run this.
00:10:29
I'm passing a wrong number of arguments; this happens because I'm calling date range.new while passing in arguments.
00:10:36
But it doesn’t expect them, so let’s just make a quick struct.
00:10:40
To hold the start date and the end date.
00:10:47
Now I have roughly the error I want, which is a wrong number of arguments in orders report.
00:10:54
I’m passing two arguments right here, and it used to be expecting three up above.
00:11:01
So, I need to thread this new date range through, and I’m going to do this in a compressed series of steps.
00:11:08
So again, don't freak out. Let’s pass in the date range.
00:11:13
Now, there’s no such thing as start date and end date; it’s just the date range.
00:11:20
The same deal here in that method we wrote in order – it’s just the date range.
00:11:27
Now, we need to ask the date range about its start date and end date and see if we're green again.
00:11:34
Boom! Okay, so is this a win? Yes, it is!
00:11:40
I think this is worth doing.
00:11:47
Bob Martin believes that most intermediate object-oriented programmers are too reluctant to extract classes.
00:11:55
You should be fairly aggressive about being willing to extract small classes like this.
00:12:01
Look at date range! It’s super simple, right?
00:12:05
It’s very basic, but it’s made something that was implicit in our code explicit.
00:12:11
Now, date range is a pair of values. This class doesn’t even have any behavior on it, but I think it’s still worth doing.
00:12:18
I've created a name for something, and I’ve pulled out an explicit name.
00:12:25
That's almost always an improvement.
00:12:32
But there's another win here: we've reduced coupling!
00:12:39
Now what’s coupling? Coupling is the degree to which two components in a system rely on each other.
00:12:46
If component A and component B have zero coupling, you can change A however you want without ever breaking B, and vice versa.
00:12:54
Low coupling is good; high coupling is bad, because low coupling makes change easier.
00:13:01
It's hard to respond to change in software.
00:13:08
I saw a keynote by the author Thomas, and he said the only thing worth worrying about when you look at code is, 'Is it easy to change?'
00:13:14
Let’s look at some quick examples of coupling.
00:13:21
Different types of coupling exist. This is called parameter coupling.
00:13:28
Notice that 'notify_user_of_failure' passes in an object called failure into another method (print_to_console), where we call two_sentences on that parameter.
00:13:34
If I pass nil into print_to_console, this would blow up with a no-method error.
00:13:41
If I pass something inappropriate, that would cause an error as well.
00:13:47
There’s coupling between these two methods.
00:13:54
Notify_user_of_failure has to know what print_to_console is going to call on its parameter.
00:14:01
If I change print_to_console, I could break notify_user_of_failure.
00:14:06
Parameter coupling isn’t bad, but less coupling is almost always good.
00:14:13
Methods taking no arguments are superior to those taking one, and one-argument methods are better than those taking two.
00:14:20
That helps lower parameter coupling.
00:14:27
Notice what happened; we slimmed down our argument list from three to two. That’s a win!
00:14:34
Coupling is reduced; that’s one win.
00:14:41
We made something explicit that used to be implicit; that’s another win.
00:14:48
We also have a great place to hang behavior.
00:14:54
In object-oriented programming, it’s a great idea to group behavior with the data it operates on.
00:15:01
I can imagine wanting to know if other objects have a date placed between two endpoints.
00:15:08
We can now reuse the date range class.
00:15:15
So, I’d like to move this logic into date range.
00:15:22
Let’s ask date range, 'Does this include my placed_at?'
00:15:30
That looks good to me! But that blows up as expected because this method doesn't exist.
00:15:38
Let’s say, 'Is date after the start date and before the end date?'
00:15:45
Let's see if that works for us.
00:15:51
It does! Now order just knows how to tell something else if its placed_at is included.
00:15:59
I like this because order should know about placed_at, and I'm okay with date range knowing how to find if a date is between two dates.
00:16:05
This is exactly where I want this logic to live.
00:16:11
Now, let's address a little refactoring.
00:16:18
There’s actually a handier way of writing this. So I can ask if the start date and end date range include the date.
00:16:25
Back to green! Good on us!
00:16:31
I gave this talk somewhere, and Jose Valim showed me something: when you ask a range if it includes something, Ruby instantiates every object in that range.
00:16:40
This could be a performance issue. If the range is 300 years long, it could slow things down.
00:16:47
So, when you ask if it includes or cover, it only instantiates the two endpoints.
00:16:53
If we had a date range that was 300 years long, it's faster performance-wise.
00:17:01
Asks: Questions?
00:17:06
That’s a great point! My tests aren’t as thorough as I’d like.
00:17:13
I should go back and improve those tests, and beef up the coverage.
00:17:20
That’s where bugs live, where you can feel that discomfort.
00:17:27
Other questions?
00:17:34
I just wanted to mention the code smell motivating this is called feature envy.
00:17:41
Feature envy is when one class shows a concern with the internals of another class.
00:17:48
Usually, that’s a good hint to move logic into the class itself.
00:17:55
Alright, a couple quick things I would do now to wrap this example up before we move on.
00:18:01
This seems pretty ugly, right? I’m mapping the amount and doing the sum, but it sounds like I’m calculating total sales.
00:18:08
So whenever I see ugly code, I ask, 'How would I say this?'}
00:18:14
I really would say something like, 'Add up the total sales.'
00:18:20
Oh, where am I? Type of orders within range.
00:18:26
Back to green again!
00:18:33
Take a look at this top-level method, the total sales. The name is 'Total Sales Within Date Range'.
00:18:39
That is essentially all I want my code to say.
00:18:46
This is my only public method, and I'm particularly aggressive about how my public methods read.
00:18:53
So now let's move on to a new example.
00:19:00
Here's another bit of code I’d like you to look over.
00:19:07
Now, this code represents a job site. It’s a place where we’re doing some construction work.
00:19:14
All job sites have a location because you're always working somewhere.
00:19:20
Not every job site will have a contact; a contact could be nil, that's optional.
00:19:26
Notice I've just described something implicit about this code.
00:19:32
It’s not so obvious that a contact is optional, but you can start to see it at these ugly conditionals.
00:19:40
If I want to pull up the contact name, I must check for the presence of contact.
00:19:46
Likewise with contact phone; I must ensure it exists before I call a method.
00:19:51
This code is okay but could be improved.
00:20:00
We’re violating 'tell, don’t ask' again. Each time asking contact something makes it awkward.
00:20:07
What I'd rather do is just directly ask for the contact's name.
00:20:13
If it does exist, return what its name is; if not, say no name.
00:20:19
Now we're co-opting nil where it stands for no contact, which isn’t a good idea.
00:20:27
To prove this, let’s use a pattern called the Null Object Pattern.
00:20:35
Instead of nil, we’ll create an explicit object to represent no contact.
00:20:41
We'll call this object NullContact. So when we don’t have a contact, we assign NullContact.new.
00:20:48
Let's run our tests! Oops, got the wrong test going.
00:20:55
Things blow up because NullContact doesn't respond to some methods.
00:21:01
So, let's add basic responses like name, phone, and delivering personalized email.
00:21:08
If there’s no contact, I want name to return 'no name,' and deliver personalized email to do nothing.
00:21:16
Let’s run this method for phone.
00:21:23
Now we’re green!
00:21:30
For the cost of adding the simple NullContact class, I've eliminated three conditionals.
00:21:36
All while making methods clearer and more direct.
00:21:43
Notice—we're no longer violating 'tell, don’t ask' because we're always telling some contact.
00:21:49
We never actually care if it's an actual contact or a NullContact.
00:21:56
That’s why this is a great win!
00:22:02
This pattern is simple, but once you know it, opportunities for application emerge.
00:22:10
Now, there’s a downside.
00:22:17
Now I have to keep two APIs in sync.
00:22:23
If I add new methods to the contact class, I need to add them to NullContact as well.
00:22:31
That’s a pain, but generally it’s worth it.
00:22:37
A lot of code ends up with big hairy nasty conditionals, and NullContact destroys them.
00:22:43
This refactoring is known as 'replace conditional with polymorphism.'
00:22:50
Instead of using if statements, always send a message to different classes.
00:22:56
This helps keep your code cleaner—better to maintain.
00:23:02
You should weigh the costs: is it worth it given your current system?
00:23:09
We talked about good signs and bad signs.
00:23:17
But how many people have code where it’s like, 'If current user blah blah blah,' raise your hand!
00:23:24
Imagine removing the complexity of checking for a current user.
00:23:32
You can return a user or a NullUser.
00:23:38
Alright. Any questions about that?
00:23:44
Next example—take a second and read this code.
00:23:52
Let’s discuss the idea of depending upon abstractions.
00:23:59
Programmers are aware that it’s a good idea to depend upon abstractions.
00:24:07
Most people prefer using libraries over writing raw SQL or bytes.
00:24:13
But what a lot of developers don’t realize is that abstractions are fractal.
00:24:20
You should build abstractions aggressively within your application!
00:24:26
A good rule of thumb is that you want the whole to be simpler than the sum of its parts.
00:24:34
Take a few models or classes and wrap a class around them.
00:24:41
Make a simple API that’s easier to use than diving into the details.
00:24:48
There’s a common violation I see often.
00:24:55
Let’s have a look at this. This code is concerned with charging people for things.
00:25:03
BrainTree is a payment processor. I pull down the BrainTree gem—better than writing raw API calls.
00:25:10
Notice how user is concerned with brain tree ID, charging for subscriptions, creating customers, and refunding.
00:25:18
We're depending on the BrainTree gem as an abstraction.
00:25:23
But we could go deeper; currently, I need to change User if I want to change the gem.
00:25:30
That’s a signal for shotgun surgery.
00:25:37
If I change something, I should only change one file.
00:25:43
How do we fix this? I create a new class called PaymentGateway.
00:25:51
I move the constant equal to BrainTree into PaymentGateway.
00:25:58
I set up my Gateway, which by default is BrainTree. All that logic lives there.
00:26:04
My client code only knows about the PaymentGateway class.
00:26:10
If I want to change payment processors, only one file needs changing.
00:26:16
That’s the power of depending upon abstractions!
00:26:23
Now, I want to test this thoroughly.
00:26:29
If I stub methods on BrainTree, I will encounter issues upon updating.
00:26:36
In this version, I can stub my own methods; it will be much easier to manage.
00:26:42
If you control your methods during your tests, you’re going to have a good time.
00:26:49
When you notice concerns like billing spread throughout, it's a good time to re-architect.
00:26:56
No gem names should appear in your business logic.
00:27:03
You should only reference PaymentGateway, not the actual processor.
00:27:10
So, we've talked about several methods of refactoring. When do you refactor?
00:27:17
The best time to refactor is when changes need to be made.
00:27:23
Usually not on a random Monday morning, saying 'I’m going to refactor some code.'