00:00:10.910
Hello, welcome! This is Git Driven Refactoring.
00:00:17.640
My name is Ashley Ellis Pierce. I am an application engineer at GitHub and I'm on the billing and payments team.
00:00:27.270
You can find me on Twitter at @ALSPierce if you'd like to continue this conversation afterward.
00:00:38.789
As I mentioned, this talk is going to be about Git driven refactoring.
00:00:44.879
But first, we should talk about what refactoring is.
00:00:50.429
For anyone not familiar, refactoring is changing the structure of code without modifying its behavior.
00:01:01.589
It's cleaning up your mess, and of course, this implies that there is indeed a mess to clean up.
00:01:06.600
We've all been there; I don't know anyone whose codebase is always perfect.
00:01:12.710
Despite our best intentions, as humans, we are imperfect and we make mistakes.
00:01:18.210
But the good news is that once we know what mistakes we've made, we can find out exactly how to fix them.
00:01:24.540
Knowing where we went wrong leads us to where we need to be.
00:01:30.780
Luckily, we're not the first ones to ever write software.
00:01:37.260
Smarter people than me have identified common places where we often go wrong.
00:01:42.690
They've put names to those common problems, and every common design problem has a specific refactoring technique to address it.
00:01:49.050
Once you know what problem you have, you can easily search and find the recommended refactor.
00:01:54.330
Now, this talk is not going to focus specifically on how to correct your code, but more on that very first step of identifying what problem we have.
00:02:02.330
For a great talk about how you can use design principle violations to find out exactly how to fix them, I recommend "Any Messes? Get a Whiff of This."
00:02:07.980
This talk goes into detail about matching up code smells with their prescribed refactoring methods.
00:02:14.300
But here, I am focusing on the first step: how to find the design problem.
00:02:19.569
So that brings us to the SOLID principles. The SOLID principles are five design principles for object-oriented development laid out by Robert Martin, and are intended to make software more flexible and maintainable.
00:02:27.400
Let's run through what these principles are.
00:02:34.090
The first is the Single Responsibility Principle: a class should have only one reason to change.
00:02:40.510
The next one is the Open-Closed Principle: a class should be open for extension but closed for modification.
00:02:46.840
Then comes the Liskov Substitution Principle: every subclass or derived class should be substitutable for their base or parent class.
00:02:53.260
Next is the Interface Segregation Principle: no client should be forced to depend on methods it does not use.
00:02:58.419
Finally, the Dependency Inversion Principle: entities must depend on abstractions, not concretions.
00:03:16.150
If you follow these principles, your code will be easier to understand and change. Instead of being a tangled mess where one change cascades and causes a change everywhere else, your code will be flexible and maintainable.
00:03:30.909
It will allow you to adapt it however you need to meet the changing needs of the business.
00:03:37.810
So, I've told you what SOLID is, what each of the principles are, and why you should follow them.
00:03:43.479
I also mentioned earlier that once you can identify what the problem is, there are already cures for what ails you.
00:03:49.209
Easy, right? Well, maybe not quite.
00:03:54.819
I said before that identifying our mistakes leads us to fixing them.
00:04:00.069
But just because you heard these rules doesn't mean you thoroughly understand them or that you can always identify when you're in violation of them.
00:04:06.639
I'd venture to guess that some of you in this audience have heard these design rules before, and yet you’re still here.
00:04:12.280
Why is that? Probably because you've struggled to recognize these patterns in real life.
00:04:18.579
That's what we're here to talk about today because I've had the same problem.
00:04:24.820
Some examples in this talk will be using Git on the command line, while others will be PRs on GitHub.
00:04:36.480
These are, of course, two separate tools, but I often use them together in conjunction to help me achieve my refactoring goals.
00:04:42.120
But these tools aren't usually thought of as the first things one might look at to help with refactoring.
00:04:56.520
You're probably wondering how or why I use them.
00:05:03.270
As I mentioned in the beginning, I work at GitHub.
00:05:09.590
Since this talk focuses a lot on GitHub, many people will assume that I formed this talk as a result of working there.
00:05:16.020
But I didn't. The beginnings of this talk started to form for me while working at Thoughtbot.
00:05:23.100
Thoughtbot is a software consultancy well known for helping companies write clean code.
00:05:28.800
They have a whole course on refactoring that they teach, and this is one of the primary things companies come to Thoughtbot for.
00:05:34.440
I started at Thoughtbot as an apprentice, and I was very green.
00:05:41.220
I had only worked at one company previously for about a year.
00:05:46.830
Although I had done all my homework, reading a ton of books on refactoring and clean code, I didn't have much experience putting that theory into practice.
00:05:53.430
I was writing code that I thought was clean, and it didn’t seem to me that I was violating any of the numerous object-oriented design rules I had memorized.
00:06:00.840
Yet, other developers were much more able to recognize patterns in the code that I just didn't see.
00:06:06.570
I would submit my code and think it looked great, or I'd review others’ code and say it was awesome.
00:06:13.050
Then later down the road, everything would grow into this mess.
00:06:19.680
Eventually, I realized that by only looking at the actual lines of code in my editor, I was doing myself a disservice.
00:06:25.470
You can't recognize design patterns in isolation; you need context.
00:06:30.630
It's like studying for a multiple-choice quiz when suddenly you're faced with essay questions.
00:06:37.750
You need to know not just what the rules are but the context in which they are expected to apply.
00:06:43.930
For me, Git is where all of my code lives; it provides that context.
00:06:49.930
Instead of looking directly at the code to identify potential design violations or areas that might need refactoring, I turned to Git.
00:06:56.410
I started looking for simple ways that Git might tell me when a class was in need of refactoring.
00:07:02.740
That's why the tips I shared today aren't some new crazy Git or GitHub feature that you've never seen before.
00:07:10.360
They involve looking at the tools we work with every day, like Git logs, PRs, and comments, but applying them differently.
00:07:17.560
Let's get back to the SOLID principles and see how we can use clues from Git to better understand them.
00:07:23.500
I'll explain each of the principles briefly as we go through.
00:07:29.590
But I want to clarify that the goal of this talk isn't to provide an exhaustive exploration of SOLID.
00:07:36.730
My aim is to use SOLID principles to exemplify how we might use Git in general to find design principle violations and refactoring opportunities.
00:07:43.810
Hopefully, this will inspire you to think about how you can use Git as a lens to view whatever software design principles are important to you.
00:07:53.890
As I mentioned before, the first SOLID principle is the Single Responsibility Principle, which means that a class should have only one reason to change.
00:08:00.460
When I hear this, I think, wait, I'm confused. I don't know about you, but my classes change for lots of reasons.
00:08:08.890
Maybe there's a new feature to add or I want to improve some bit of code.
00:08:15.070
When I was really trying to figure this out, I thought hard about why my class had changed recently.
00:08:21.130
Ideally, you would never write any code that violated the Single Responsibility Principle.
00:08:26.440
But if you're like me, you probably thought everything was going great when you wrote it.
00:08:31.810
Then, down the line, something starts to feel off, and you can't really pinpoint what it is.
00:08:38.469
Your code gets harder to change, and you start looking for ways to refactor.
00:08:44.260
This question is a great place to start: you have to really know the history of your classes to understand what might have gone wrong.
00:08:51.160
Fortunately, there’s a place specifically suited for that: the Git log for that class.
00:08:57.220
This is where I start when I want to understand if any of my classes might be violating some design rule.
00:09:02.920
Looking at the commit history for my engine class from my Tesla Motors company, something seems reasonable.
00:09:08.290
You don't have to squint and try to read it, but with a cursory look, it seems reasonable to me.
00:09:13.480
There are lots of small, well-named methods that do one thing.
00:09:18.880
Methods like 'engine_start' take a true or false input as to whether we're starting the engine remotely.
00:09:24.940
There's even a method for delaying the start so you can set a time, like 9:00 a.m. during winter.
00:09:30.940
Finally, an idle method decreases the power output of the engine while a hill access increases it.
00:09:36.700
Then we define some devices that are allowed to start the engine remotely, and at a glance, this all seems reasonable.
00:09:43.360
However, this class has been feeling a little off to me; I sense it needs to be refactored.
00:09:49.680
I can't quite pinpoint what's wrong with it.
00:09:56.680
Looking just at the code sometimes misses issues.
00:10:03.640
So, I will look again at the commit history for the class.
00:10:10.930
It seems like many changes were made relating to remote start functionality.
00:10:16.510
Upon reflection, I notice that there are two logical groups of commits.
00:10:22.600
The first group relates to the power output of the engine, such as increasing power or adjusting frequency.
00:10:30.070
The second group relates to remote start functionality.
00:10:35.470
Initially, I thought there were several reasons the class had changed.
00:10:41.500
Identifying one was impossible.
00:10:46.510
However, when I grouped these changes into larger categories, I realized the class has two reasons for change: the power output and the remote start.
00:10:53.920
Now, moving forward with refactoring seems much easier.
00:10:59.110
I know that I probably need to create a separate class for remote start, allowing me to keep responsibilities distinct.
00:11:05.500
Now, I won't have to worry about changes to remote start affecting how power output is determined by the engine.
00:11:11.140
For this approach to work, you need to write commit messages of a particular type.
00:11:18.279
You need well-written commit messages that document the significant changes in your code and why they occurred.
00:11:24.279
If you have numerous trivial commits, such as adding whitespace, that won't help.
00:11:30.880
Trivial changes add nothing to the overall story of your code.
00:11:36.640
You can write commits like this as you work, but be sure to rebase or squash merge your commits to keep your commit messages clear.
00:11:43.830
Embrace the full power of Git to help you refactor your code by using it in a way that makes things easy.
00:11:50.350
I've heard arguments from those who don't care much for nice comments.
00:11:56.079
They often say that they use PRs more as their documentation.
00:12:02.740
But there's no easy way to see all the PRs that touch a specific class.
00:12:08.380
With your PRs, you can't quickly gain context locally or in your editor like you can with the Git log.
00:12:14.410
In my opinion, that’s reason enough to use clear and concise commit messages.
00:12:20.829
The next SOLID principle is the Open-Closed Principle: a class should be open for extension but closed for modification.
00:12:26.350
Initially, this principle sounds simple, but I found it confusing.
00:12:31.240
How could you extend something without modifying it?
00:12:37.540
I read many books, blog posts, and articles on this, but it was difficult to apply.
00:12:43.840
I’d review my coworkers' PRs and try to guess if a class would need to change in the future.
00:12:52.070
However, I struggled to fully grasp the principle.
00:12:57.800
Let's take a look at some example code and PRs in GitHub to clarify this concept.
00:13:01.240
You don't have to read all this code, but here's a class that checks for engine health.
00:13:06.760
Its main concern is checking for any electrical shorts in the engine.
00:13:12.380
This PR gets submitted by one of my teammates, and it's my job to review it.
00:13:17.959
I think, 'Ship it! It's great!'
00:13:25.450
Then, a new requirement comes in: they want the engine to check that all the spark plugs are connected.
00:13:30.200
Someone submits a new PR that includes this functionality.
00:13:35.450
Next, another feature request comes in: they want to check the alternator.
00:13:42.560
My coworker adds their next feature in another PR.
00:13:49.850
Without needing to read the tiny code, we can notice something in the diffs.
00:13:55.960
This PR has more red and green than a Christmas tree.
00:14:03.980
We're adding features, and although the new lines are green, we also see a lot of red lines.
00:14:09.280
The phrase ‘a class should be open for extension but closed for modification’ means you shouldn’t have to modify existing code to add new functionality.
00:14:16.880
In GitHub context, this means that when adding features, there should be no red in the diffs.
00:14:24.180
If every time you're adding functionality, you also need to modify existing code, then you're violating the Open-Closed Principle.
00:14:31.939
Recognizing this was a mind-blowing moment for me.
00:14:38.430
Learning to look for this one indicator in my Git diffs helped me better understand the principle.
00:14:45.540
If you see red in your diffs when adding a feature, you may be violating the Open-Closed Principle.
00:14:51.629
Now that we know this, fixing the class to make it truly open for extension, but closed for modification becomes crucial for maintainability.
00:14:57.540
All that red in the diff indicates extra work someone must do to shoehorn the new feature in.
00:15:04.330
After recognizing the problem as an Open-Closed violation, you have the chance to fix that class.
00:15:10.560
This allows you to ensure that nobody has to do that extra work again in the future.
00:15:18.200
In the previous segment, I mentioned that clear commit messages are important.
00:15:24.240
Here's a caveat: consistent style matters too.
00:15:30.900
I used to think that having a set style guide was unnecessary, but I now appreciate its significance.
00:15:38.890
When I recognized how powerful Git can be for identifying refactoring opportunities, maintaining style became more important.
00:15:44.790
If, in addition to adding a feature, you're changing syntax or rearranging methods, it could hinder the review process.
00:15:51.640
This might complicate reviewing your code and make keeping track of more complex refactoring efforts difficult.
00:15:58.700
By keeping things easy and ensuring consistent style, you will make programming and reviewing code simpler.
00:16:03.900
The next principle we’ll cover is the Liskov Substitution Principle.
00:16:10.630
Every subclass or derived class should be substitutable for their base or parent class.
00:16:16.500
In other words, you should be able to call the same method on a subclass as you could on a parent and expect a compatible result.
00:16:23.880
If the superclass returns a string, the subclass should also handle that method and return a string.
00:16:31.360
In many languages, this is less of a concern because the language does some checking for you.
00:16:38.200
However, in Ruby, there are no such guarantees.
00:16:44.469
Despite our best intentions, I or someone else on my team could write code like this.
00:16:50.350
For instance, here we have a Viewable class with a method called turn_on_lights.
00:16:56.470
In its Car subclass, we only activate the headlights when this method is called.
00:17:03.540
For a Bike, we don't have any lights, so we raise a 'not implemented' error.
00:17:09.790
This triggers an LSP violation, since you can't call turn_on_lights on Bike and get a compatible result.
00:17:15.320
People forget these principles, and despite our best efforts, nothing is perfect.
00:17:22.150
I might end up pushing this code up to GitHub and opening a PR.
00:17:28.070
My code will be reviewed by a teammate, but they too are not perfect.
00:17:34.930
I don't want to rely on them to catch these issues.
00:17:40.760
Thanks to GitHub's webhooks, I can set up checks.
00:17:46.150
When opening the PR, I can receive an automatic comment from our style bot warning me about a potential LSP violation.
00:17:53.080
Automatic checks are fantastic for ensuring that clean code remains clean.
00:17:59.420
Consider writing your own style bot to check for design principles important to you.
00:18:06.130
There are tools like linters that can check for more general style guide violations.
00:18:14.150
Every day, more powerful tools emerge to analyze your code.
00:18:19.350
If you've never considered writing custom webhooks, I highly encourage it.
00:18:26.310
Internally at GitHub, we have checks for changes in database files to warn us of potential issues.
00:18:32.250
The possibilities are endless, and this can be a fantastic tool for refactoring.
00:18:38.440
Moreover, there are some things you could check for using these methods.
00:18:45.320
Although not every instance may be a clear LSP violation, they could indicate a code smell.
00:18:50.130
This is particularly true in dynamically typed languages like Ruby.
00:18:56.680
Invoking a method that relies on the type of an object can lead to violations.
00:19:03.360
Having a bot call attention to this can motivate the developer to rethink their implementation.
00:19:09.620
Reviewers would also appreciate this attention and offer suggestions.
00:19:16.000
Don’t depend solely on reviewers to spot such issues; they might not mention them.
00:19:24.000
Let your bots do the nagging; they can catch things that we miss.
00:19:30.120
When developing a bot, focus on helping improve your code.
00:19:36.440
As much as I would like to say I can always focus on code quality, we all have distractions.
00:19:42.790
Automate as much as you can to avoid LSP violations.
00:19:50.000
Doing so will facilitate code adaptation and growth.
00:19:56.310
The next principle is the Interface Segregation Principle.
00:20:02.250
No client should be forced to depend on methods it does not use.
00:20:06.630
The classic example of this is in typed languages like Java.
00:20:12.900
Let's switch to Java to illustrate this concept.
00:20:18.090
For example, let’s consider the Aircraft interface.
00:20:24.370
This specifies that anything implementing it must define certain methods.
00:20:29.990
An example would include takeoff and retracting landing gear.
00:20:36.160
Here we have an AirTrafficControl class, and it takes anything implementing the Aircraft interface.
00:20:42.310
It directs the aircraft to takeoff.
00:20:49.230
Now, imagine we have a Helicopter as another type of Aircraft.
00:20:55.790
While it needs to take off, it doesn't need to retract landing gear.
00:21:02.050
This situation illustrates an Interface Segregation violation.
00:21:07.850
AirTrafficControl relies on Aircrafts having all methods defined under the interface.
00:21:14.020
Many small, client-specific interfaces are preferred over one large general-purpose interface.
00:21:20.120
Let's see if we can spot ISP violations in non-typed languages like Ruby.
00:21:26.210
Taking another look at the AirTrafficControl class in Ruby, we initialize it by passing in all the aircrafts to deal with.
00:21:32.010
There's a method called today's aircraft that checks the flight dates.
00:21:38.830
However, our list of aircraft is required to be an ActiveRecord relation.
00:21:45.480
This implies AirTrafficControl relies on the ActiveRecord interface.
00:21:52.580
Suppose we wanted to pass in aircrafts from a CSV instead; the ActiveRecord dependency would restrict that.
00:21:58.680
That assumption can escalate quickly.
00:22:05.050
You would have to re-implement all ActiveRecord functionality in the CSV aircrafts class.
00:22:12.620
This is a violation of the spirit of ISP.
00:22:19.320
While using both small client-specific interfaces and large general-purpose interfaces can offer flexibility, balance is key.
00:22:25.920
Use Git to find that balance.
00:22:32.330
I would use Git logs to assess how often classes are changing.
00:22:38.360
What we're looking for here is whether the changes in our class correspond to any ISP violations.
00:22:44.270
Next, we move on to the Dependency Inversion Principle.
00:22:51.300
Entities must depend on abstractions, not concretions.
00:22:58.170
Here we have a Car class with methods like play_radio and silent.
00:23:04.550
Within it, we call FM radio.play, making FM radio a direct dependency.
00:23:10.230
This ties the car’s existence to FM radio.
00:23:17.050
To honor the Dependency Inversion Principle, we practice a concept called Dependency Injection.
00:23:23.200
Instead of calling FM radio directly, we inject it when initializing the class.
00:23:30.400
This allows us to decouple our code.
00:23:36.640
With a style bot, you could call attention to understanding this code.
00:23:42.600
It’s more reliable than relying on yourself or your coworkers to spot violations.
00:23:48.920
Automating checks ensures that at the very least, you must think through your code.
00:23:57.030
If you have numerous points in your code that couple classes to dependencies, utilize history and blame to prioritize fixes.
00:24:03.280
Adapting your code to inject dependencies makes it more flexible and maintainable.
00:24:09.320
You can now introduce cars equipped with all different types of radios, as long as they respond to the play method.
00:24:15.040
Recapping, here's how Git can help you refactor.
00:24:22.570
Look at Git logs to group commits into themes, recognizing when a class is addressing multiple responsibilities.
00:24:29.490
Keep an eye out for red in diffs when adding functionality, as this could signify violations of the Open-Closed Principle.
00:24:35.840
Integrate tools like automatic style linters with Git to facilitate code quality.
00:24:42.300
Make use of the Git log and blame to discover how frequently classes or methods change, identifying the most significant offenders.
00:24:48.500
By adhering to these tips, you can refactor and reach SOLID code.
00:24:55.050
Your code will be more flexible and maintainable, allowing for easier adaptations.
00:25:03.200
For all of you familiar with the SOLID principles but struggling to apply them, these tips will help you identify violations and see their context in your work.
00:25:11.050
Thank you!