Git Driven Refactoring

by Ashley Ellis Pierce

In her talk "Git Driven Refactoring" at RubyConf 2017, Ashley Ellis Pierce, an application engineer at GitHub, discusses the challenges of identifying and addressing code that needs refactoring. Refactoring involves changing the structure of code without altering its behavior, yet many developers struggle to pinpoint the problems within their codebases. To aid in this process, Pierce emphasizes employing Git as a tool to analyze code against the SOLID principles, a set of five design principles aimed at making software more flexible and maintainable.

Key Points Discussed:

Understanding Refactoring: Refactoring is essential for cleaning up messy code, and while developers often recognize that refactoring is needed, they may struggle to identify where to start.
SOLID Principles: Pierce outlines the SOLID principles:
- Single Responsibility Principle: A class should have only one reason to change.
- Open-Closed Principle: A class should be open for extension but closed for modification.
- Liskov Substitution Principle: Subclasses must be substitutable for their parent classes.
- Interface Segregation Principle: No client should depend on methods it does not use.
- Dependency Inversion Principle: Depend on abstractions, not concretions.
Using Git for Refactoring:
- Git logs reveal the history of changes in classes, helping identify violations of SOLID principles. For example, grouping commit histories can highlight whether classes are handling multiple responsibilities.
- The presence of modifications (red in diffs) when adding new features signals potential violations of the Open-Closed principle.
Automation and Tools: Leveraging GitHub tools, such as automatic checks for code quality, helps ensure adherence to design principles without solely relying on manual code reviews. Pierce encourages creating custom bots that can help detect design principle violations during pull request reviews.

Examples and Illustrations:

Pierce shared personal experiences, recalling her time at Thoughtbot, where she encountered challenges in recognizing design problems until she utilized Git to explore the history and context of changes.
She discussed how looking at commit histories allowed her to group changes thematically, leading to more effective refactoring decisions.

Conclusion and Takeaways:

Developers should actively use Git not just for version control but as a lens to identify design principle violations.
Clear and consistent commit messages, along with proactive automation practices, can facilitate easier and more effective refactoring.
By applying the SOLID principles, developers can produce more maintainable and adaptable code, minimizing future technical debt.

Ultimately, Pierce’s talk inspires developers to view Git as a critical ally in their refactoring efforts, equipping them with practical strategies to create better codebases.

00:00:10.910 Hello, welcome! This is Git Driven Refactoring.

00:00:17.640 My name is Ashley Ellis Pierce. I am an application engineer at GitHub and I'm on the billing and payments team.

00:00:27.270 You can find me on Twitter at @ALSPierce if you'd like to continue this conversation afterward.

00:00:38.789 As I mentioned, this talk is going to be about Git driven refactoring.

00:00:44.879 But first, we should talk about what refactoring is.

00:00:50.429 For anyone not familiar, refactoring is changing the structure of code without modifying its behavior.

00:01:01.589 It's cleaning up your mess, and of course, this implies that there is indeed a mess to clean up.

00:01:06.600 We've all been there; I don't know anyone whose codebase is always perfect.

00:01:12.710 Despite our best intentions, as humans, we are imperfect and we make mistakes.

00:01:18.210 But the good news is that once we know what mistakes we've made, we can find out exactly how to fix them.

00:01:24.540 Knowing where we went wrong leads us to where we need to be.

00:01:30.780 Luckily, we're not the first ones to ever write software.

00:01:37.260 Smarter people than me have identified common places where we often go wrong.

00:01:42.690 They've put names to those common problems, and every common design problem has a specific refactoring technique to address it.

00:01:49.050 Once you know what problem you have, you can easily search and find the recommended refactor.

00:01:54.330 Now, this talk is not going to focus specifically on how to correct your code, but more on that very first step of identifying what problem we have.

00:02:02.330 For a great talk about how you can use design principle violations to find out exactly how to fix them, I recommend "Any Messes? Get a Whiff of This."

00:02:07.980 This talk goes into detail about matching up code smells with their prescribed refactoring methods.

00:02:14.300 But here, I am focusing on the first step: how to find the design problem.

00:02:19.569 So that brings us to the SOLID principles. The SOLID principles are five design principles for object-oriented development laid out by Robert Martin, and are intended to make software more flexible and maintainable.

00:02:27.400 Let's run through what these principles are.

00:02:34.090 The first is the Single Responsibility Principle: a class should have only one reason to change.

00:02:40.510 The next one is the Open-Closed Principle: a class should be open for extension but closed for modification.

00:02:46.840 Then comes the Liskov Substitution Principle: every subclass or derived class should be substitutable for their base or parent class.

00:02:53.260 Next is the Interface Segregation Principle: no client should be forced to depend on methods it does not use.

00:02:58.419 Finally, the Dependency Inversion Principle: entities must depend on abstractions, not concretions.

00:03:16.150 If you follow these principles, your code will be easier to understand and change. Instead of being a tangled mess where one change cascades and causes a change everywhere else, your code will be flexible and maintainable.

00:03:30.909 It will allow you to adapt it however you need to meet the changing needs of the business.

00:03:37.810 So, I've told you what SOLID is, what each of the principles are, and why you should follow them.

00:03:43.479 I also mentioned earlier that once you can identify what the problem is, there are already cures for what ails you.

00:03:49.209 Easy, right? Well, maybe not quite.

00:03:54.819 I said before that identifying our mistakes leads us to fixing them.

00:04:00.069 But just because you heard these rules doesn't mean you thoroughly understand them or that you can always identify when you're in violation of them.

00:04:06.639 I'd venture to guess that some of you in this audience have heard these design rules before, and yet you’re still here.

00:04:12.280 Why is that? Probably because you've struggled to recognize these patterns in real life.

00:04:18.579 That's what we're here to talk about today because I've had the same problem.

00:04:24.820 Some examples in this talk will be using Git on the command line, while others will be PRs on GitHub.

00:04:36.480 These are, of course, two separate tools, but I often use them together in conjunction to help me achieve my refactoring goals.

00:04:42.120 But these tools aren't usually thought of as the first things one might look at to help with refactoring.

00:04:56.520 You're probably wondering how or why I use them.

00:05:03.270 As I mentioned in the beginning, I work at GitHub.

00:05:09.590 Since this talk focuses a lot on GitHub, many people will assume that I formed this talk as a result of working there.

00:05:16.020 But I didn't. The beginnings of this talk started to form for me while working at Thoughtbot.

00:05:23.100 Thoughtbot is a software consultancy well known for helping companies write clean code.

00:05:28.800 They have a whole course on refactoring that they teach, and this is one of the primary things companies come to Thoughtbot for.

00:05:34.440 I started at Thoughtbot as an apprentice, and I was very green.

00:05:41.220 I had only worked at one company previously for about a year.

00:05:46.830 Although I had done all my homework, reading a ton of books on refactoring and clean code, I didn't have much experience putting that theory into practice.

00:05:53.430 I was writing code that I thought was clean, and it didn’t seem to me that I was violating any of the numerous object-oriented design rules I had memorized.

00:06:00.840 Yet, other developers were much more able to recognize patterns in the code that I just didn't see.

00:06:06.570 I would submit my code and think it looked great, or I'd review others’ code and say it was awesome.

00:06:13.050 Then later down the road, everything would grow into this mess.

00:06:19.680 Eventually, I realized that by only looking at the actual lines of code in my editor, I was doing myself a disservice.

00:06:25.470 You can't recognize design patterns in isolation; you need context.

00:06:30.630 It's like studying for a multiple-choice quiz when suddenly you're faced with essay questions.

00:06:37.750 You need to know not just what the rules are but the context in which they are expected to apply.

00:06:43.930 For me, Git is where all of my code lives; it provides that context.

00:06:49.930 Instead of looking directly at the code to identify potential design violations or areas that might need refactoring, I turned to Git.

00:06:56.410 I started looking for simple ways that Git might tell me when a class was in need of refactoring.

00:07:02.740 That's why the tips I shared today aren't some new crazy Git or GitHub feature that you've never seen before.

00:07:10.360 They involve looking at the tools we work with every day, like Git logs, PRs, and comments, but applying them differently.

00:07:17.560 Let's get back to the SOLID principles and see how we can use clues from Git to better understand them.

00:07:23.500 I'll explain each of the principles briefly as we go through.

00:07:29.590 But I want to clarify that the goal of this talk isn't to provide an exhaustive exploration of SOLID.

00:07:36.730 My aim is to use SOLID principles to exemplify how we might use Git in general to find design principle violations and refactoring opportunities.

00:07:43.810 Hopefully, this will inspire you to think about how you can use Git as a lens to view whatever software design principles are important to you.

00:07:53.890 As I mentioned before, the first SOLID principle is the Single Responsibility Principle, which means that a class should have only one reason to change.

00:08:00.460 When I hear this, I think, wait, I'm confused. I don't know about you, but my classes change for lots of reasons.

00:08:08.890 Maybe there's a new feature to add or I want to improve some bit of code.

00:08:15.070 When I was really trying to figure this out, I thought hard about why my class had changed recently.

00:08:21.130 Ideally, you would never write any code that violated the Single Responsibility Principle.

00:08:26.440 But if you're like me, you probably thought everything was going great when you wrote it.

00:08:31.810 Then, down the line, something starts to feel off, and you can't really pinpoint what it is.

00:08:38.469 Your code gets harder to change, and you start looking for ways to refactor.

00:08:44.260 This question is a great place to start: you have to really know the history of your classes to understand what might have gone wrong.

00:08:51.160 Fortunately, there’s a place specifically suited for that: the Git log for that class.

00:08:57.220 This is where I start when I want to understand if any of my classes might be violating some design rule.

00:09:02.920 Looking at the commit history for my engine class from my Tesla Motors company, something seems reasonable.

00:09:08.290 You don't have to squint and try to read it, but with a cursory look, it seems reasonable to me.

00:09:13.480 There are lots of small, well-named methods that do one thing.

00:09:18.880 Methods like 'engine_start' take a true or false input as to whether we're starting the engine remotely.

00:09:24.940 There's even a method for delaying the start so you can set a time, like 9:00 a.m. during winter.

00:09:30.940 Finally, an idle method decreases the power output of the engine while a hill access increases it.

00:09:36.700 Then we define some devices that are allowed to start the engine remotely, and at a glance, this all seems reasonable.

00:09:43.360 However, this class has been feeling a little off to me; I sense it needs to be refactored.

00:09:49.680 I can't quite pinpoint what's wrong with it.

00:09:56.680 Looking just at the code sometimes misses issues.

00:10:03.640 So, I will look again at the commit history for the class.

00:10:10.930 It seems like many changes were made relating to remote start functionality.

00:10:16.510 Upon reflection, I notice that there are two logical groups of commits.

00:10:22.600 The first group relates to the power output of the engine, such as increasing power or adjusting frequency.

00:10:30.070 The second group relates to remote start functionality.

00:10:35.470 Initially, I thought there were several reasons the class had changed.

00:10:41.500 Identifying one was impossible.

00:10:46.510 However, when I grouped these changes into larger categories, I realized the class has two reasons for change: the power output and the remote start.

00:10:53.920 Now, moving forward with refactoring seems much easier.

00:10:59.110 I know that I probably need to create a separate class for remote start, allowing me to keep responsibilities distinct.

00:11:05.500 Now, I won't have to worry about changes to remote start affecting how power output is determined by the engine.

00:11:11.140 For this approach to work, you need to write commit messages of a particular type.

00:11:18.279 You need well-written commit messages that document the significant changes in your code and why they occurred.

00:11:24.279 If you have numerous trivial commits, such as adding whitespace, that won't help.

00:11:30.880 Trivial changes add nothing to the overall story of your code.

00:11:36.640 You can write commits like this as you work, but be sure to rebase or squash merge your commits to keep your commit messages clear.

00:11:43.830 Embrace the full power of Git to help you refactor your code by using it in a way that makes things easy.

00:11:50.350 I've heard arguments from those who don't care much for nice comments.

00:11:56.079 They often say that they use PRs more as their documentation.

00:12:02.740 But there's no easy way to see all the PRs that touch a specific class.

00:12:08.380 With your PRs, you can't quickly gain context locally or in your editor like you can with the Git log.

00:12:14.410 In my opinion, that’s reason enough to use clear and concise commit messages.

00:12:20.829 The next SOLID principle is the Open-Closed Principle: a class should be open for extension but closed for modification.

00:12:26.350 Initially, this principle sounds simple, but I found it confusing.

00:12:31.240 How could you extend something without modifying it?

00:12:37.540 I read many books, blog posts, and articles on this, but it was difficult to apply.

00:12:43.840 I’d review my coworkers' PRs and try to guess if a class would need to change in the future.

00:12:52.070 However, I struggled to fully grasp the principle.

00:12:57.800 Let's take a look at some example code and PRs in GitHub to clarify this concept.

00:13:01.240 You don't have to read all this code, but here's a class that checks for engine health.

00:13:06.760 Its main concern is checking for any electrical shorts in the engine.

00:13:12.380 This PR gets submitted by one of my teammates, and it's my job to review it.

00:13:17.959 I think, 'Ship it! It's great!'

00:13:25.450 Then, a new requirement comes in: they want the engine to check that all the spark plugs are connected.

00:13:30.200 Someone submits a new PR that includes this functionality.

00:13:35.450 Next, another feature request comes in: they want to check the alternator.

00:13:42.560 My coworker adds their next feature in another PR.

00:13:49.850 Without needing to read the tiny code, we can notice something in the diffs.

00:13:55.960 This PR has more red and green than a Christmas tree.

00:14:03.980 We're adding features, and although the new lines are green, we also see a lot of red lines.

00:14:09.280 The phrase ‘a class should be open for extension but closed for modification’ means you shouldn’t have to modify existing code to add new functionality.

00:14:16.880 In GitHub context, this means that when adding features, there should be no red in the diffs.

00:14:24.180 If every time you're adding functionality, you also need to modify existing code, then you're violating the Open-Closed Principle.

00:14:31.939 Recognizing this was a mind-blowing moment for me.

00:14:38.430 Learning to look for this one indicator in my Git diffs helped me better understand the principle.

00:14:45.540 If you see red in your diffs when adding a feature, you may be violating the Open-Closed Principle.

00:14:51.629 Now that we know this, fixing the class to make it truly open for extension, but closed for modification becomes crucial for maintainability.

00:14:57.540 All that red in the diff indicates extra work someone must do to shoehorn the new feature in.

00:15:04.330 After recognizing the problem as an Open-Closed violation, you have the chance to fix that class.

00:15:10.560 This allows you to ensure that nobody has to do that extra work again in the future.

00:15:18.200 In the previous segment, I mentioned that clear commit messages are important.

00:15:24.240 Here's a caveat: consistent style matters too.

00:15:30.900 I used to think that having a set style guide was unnecessary, but I now appreciate its significance.

00:15:38.890 When I recognized how powerful Git can be for identifying refactoring opportunities, maintaining style became more important.

00:15:44.790 If, in addition to adding a feature, you're changing syntax or rearranging methods, it could hinder the review process.

00:15:51.640 This might complicate reviewing your code and make keeping track of more complex refactoring efforts difficult.

00:15:58.700 By keeping things easy and ensuring consistent style, you will make programming and reviewing code simpler.

00:16:03.900 The next principle we’ll cover is the Liskov Substitution Principle.

00:16:10.630 Every subclass or derived class should be substitutable for their base or parent class.

00:16:16.500 In other words, you should be able to call the same method on a subclass as you could on a parent and expect a compatible result.

00:16:23.880 If the superclass returns a string, the subclass should also handle that method and return a string.

00:16:31.360 In many languages, this is less of a concern because the language does some checking for you.

00:16:38.200 However, in Ruby, there are no such guarantees.

00:16:44.469 Despite our best intentions, I or someone else on my team could write code like this.

00:16:50.350 For instance, here we have a Viewable class with a method called turn_on_lights.

00:16:56.470 In its Car subclass, we only activate the headlights when this method is called.

00:17:03.540 For a Bike, we don't have any lights, so we raise a 'not implemented' error.

00:17:09.790 This triggers an LSP violation, since you can't call turn_on_lights on Bike and get a compatible result.

00:17:15.320 People forget these principles, and despite our best efforts, nothing is perfect.

00:17:22.150 I might end up pushing this code up to GitHub and opening a PR.

00:17:28.070 My code will be reviewed by a teammate, but they too are not perfect.

00:17:34.930 I don't want to rely on them to catch these issues.

00:17:40.760 Thanks to GitHub's webhooks, I can set up checks.

00:17:46.150 When opening the PR, I can receive an automatic comment from our style bot warning me about a potential LSP violation.

00:17:53.080 Automatic checks are fantastic for ensuring that clean code remains clean.

00:17:59.420 Consider writing your own style bot to check for design principles important to you.

00:18:06.130 There are tools like linters that can check for more general style guide violations.

00:18:14.150 Every day, more powerful tools emerge to analyze your code.

00:18:19.350 If you've never considered writing custom webhooks, I highly encourage it.

00:18:26.310 Internally at GitHub, we have checks for changes in database files to warn us of potential issues.

00:18:32.250 The possibilities are endless, and this can be a fantastic tool for refactoring.

00:18:38.440 Moreover, there are some things you could check for using these methods.

00:18:45.320 Although not every instance may be a clear LSP violation, they could indicate a code smell.

00:18:50.130 This is particularly true in dynamically typed languages like Ruby.

00:18:56.680 Invoking a method that relies on the type of an object can lead to violations.

00:19:03.360 Having a bot call attention to this can motivate the developer to rethink their implementation.

00:19:09.620 Reviewers would also appreciate this attention and offer suggestions.

00:19:16.000 Don’t depend solely on reviewers to spot such issues; they might not mention them.

00:19:24.000 Let your bots do the nagging; they can catch things that we miss.

00:19:30.120 When developing a bot, focus on helping improve your code.

00:19:36.440 As much as I would like to say I can always focus on code quality, we all have distractions.

00:19:42.790 Automate as much as you can to avoid LSP violations.

00:19:50.000 Doing so will facilitate code adaptation and growth.

00:19:56.310 The next principle is the Interface Segregation Principle.

00:20:02.250 No client should be forced to depend on methods it does not use.

00:20:06.630 The classic example of this is in typed languages like Java.

00:20:12.900 Let's switch to Java to illustrate this concept.

00:20:18.090 For example, let’s consider the Aircraft interface.

00:20:24.370 This specifies that anything implementing it must define certain methods.

00:20:29.990 An example would include takeoff and retracting landing gear.

00:20:36.160 Here we have an AirTrafficControl class, and it takes anything implementing the Aircraft interface.

00:20:42.310 It directs the aircraft to takeoff.

00:20:49.230 Now, imagine we have a Helicopter as another type of Aircraft.

00:20:55.790 While it needs to take off, it doesn't need to retract landing gear.

00:21:02.050 This situation illustrates an Interface Segregation violation.

00:21:07.850 AirTrafficControl relies on Aircrafts having all methods defined under the interface.

00:21:14.020 Many small, client-specific interfaces are preferred over one large general-purpose interface.

00:21:20.120 Let's see if we can spot ISP violations in non-typed languages like Ruby.

00:21:26.210 Taking another look at the AirTrafficControl class in Ruby, we initialize it by passing in all the aircrafts to deal with.

00:21:32.010 There's a method called today's aircraft that checks the flight dates.

00:21:38.830 However, our list of aircraft is required to be an ActiveRecord relation.

00:21:45.480 This implies AirTrafficControl relies on the ActiveRecord interface.

00:21:52.580 Suppose we wanted to pass in aircrafts from a CSV instead; the ActiveRecord dependency would restrict that.

00:21:58.680 That assumption can escalate quickly.

00:22:05.050 You would have to re-implement all ActiveRecord functionality in the CSV aircrafts class.

00:22:12.620 This is a violation of the spirit of ISP.

00:22:19.320 While using both small client-specific interfaces and large general-purpose interfaces can offer flexibility, balance is key.

00:22:25.920 Use Git to find that balance.

00:22:32.330 I would use Git logs to assess how often classes are changing.

00:22:38.360 What we're looking for here is whether the changes in our class correspond to any ISP violations.

00:22:44.270 Next, we move on to the Dependency Inversion Principle.

00:22:51.300 Entities must depend on abstractions, not concretions.

00:22:58.170 Here we have a Car class with methods like play_radio and silent.

00:23:04.550 Within it, we call FM radio.play, making FM radio a direct dependency.

00:23:10.230 This ties the car’s existence to FM radio.

00:23:17.050 To honor the Dependency Inversion Principle, we practice a concept called Dependency Injection.

00:23:23.200 Instead of calling FM radio directly, we inject it when initializing the class.

00:23:30.400 This allows us to decouple our code.

00:23:36.640 With a style bot, you could call attention to understanding this code.

00:23:42.600 It’s more reliable than relying on yourself or your coworkers to spot violations.

00:23:48.920 Automating checks ensures that at the very least, you must think through your code.

00:23:57.030 If you have numerous points in your code that couple classes to dependencies, utilize history and blame to prioritize fixes.

00:24:03.280 Adapting your code to inject dependencies makes it more flexible and maintainable.

00:24:09.320 You can now introduce cars equipped with all different types of radios, as long as they respond to the play method.

00:24:15.040 Recapping, here's how Git can help you refactor.

00:24:22.570 Look at Git logs to group commits into themes, recognizing when a class is addressing multiple responsibilities.

00:24:29.490 Keep an eye out for red in diffs when adding functionality, as this could signify violations of the Open-Closed Principle.

00:24:35.840 Integrate tools like automatic style linters with Git to facilitate code quality.

00:24:42.300 Make use of the Git log and blame to discover how frequently classes or methods change, identifying the most significant offenders.

00:24:48.500 By adhering to these tips, you can refactor and reach SOLID code.

00:24:55.050 Your code will be more flexible and maintainable, allowing for easier adaptations.

00:25:03.200 For all of you familiar with the SOLID principles but struggling to apply them, these tips will help you identify violations and see their context in your work.

00:25:11.050 Thank you!