Git Driven Refactoring

00:00:10.910 Hello, welcome! This is Git Driven Refactoring.

00:00:17.640 My name is Ashley Ellis Pierce. I am an application engineer at GitHub and I'm on the billing and payments team.

00:00:27.270 You can find me on Twitter at @ALSPierce if you'd like to continue this conversation afterward.

00:00:38.789 As I mentioned, this talk is going to be about Git driven refactoring.

00:00:44.879 But first, we should talk about what refactoring is.

00:00:50.429 For anyone not familiar, refactoring is changing the structure of code without modifying its behavior.

00:01:01.589 It's cleaning up your mess, and of course, this implies that there is indeed a mess to clean up.

00:01:06.600 We've all been there; I don't know anyone whose codebase is always perfect.

00:01:12.710 Despite our best intentions, as humans, we are imperfect and we make mistakes.

00:01:18.210 But the good news is that once we know what mistakes we've made, we can find out exactly how to fix them.

00:01:24.540 Knowing where we went wrong leads us to where we need to be.

00:01:30.780 Luckily, we're not the first ones to ever write software.

00:01:37.260 Smarter people than me have identified common places where we often go wrong.

00:01:42.690 They've put names to those common problems, and every common design problem has a specific refactoring technique to address it.

00:01:49.050 Once you know what problem you have, you can easily search and find the recommended refactor.

00:01:54.330 Now, this talk is not going to focus specifically on how to correct your code, but more on that very first step of identifying what problem we have.

00:02:02.330 For a great talk about how you can use design principle violations to find out exactly how to fix them, I recommend "Any Messes? Get a Whiff of This."

00:02:07.980 This talk goes into detail about matching up code smells with their prescribed refactoring methods.

00:02:14.300 But here, I am focusing on the first step: how to find the design problem.

00:02:19.569 So that brings us to the SOLID principles. The SOLID principles are five design principles for object-oriented development laid out by Robert Martin, and are intended to make software more flexible and maintainable.

00:02:27.400 Let's run through what these principles are.

00:02:34.090 The first is the Single Responsibility Principle: a class should have only one reason to change.

00:02:40.510 The next one is the Open-Closed Principle: a class should be open for extension but closed for modification.

00:02:46.840 Then comes the Liskov Substitution Principle: every subclass or derived class should be substitutable for their base or parent class.

00:02:53.260 Next is the Interface Segregation Principle: no client should be forced to depend on methods it does not use.

00:02:58.419 Finally, the Dependency Inversion Principle: entities must depend on abstractions, not concretions.

00:03:16.150 If you follow these principles, your code will be easier to understand and change. Instead of being a tangled mess where one change cascades and causes a change everywhere else, your code will be flexible and maintainable.

00:03:30.909 It will allow you to adapt it however you need to meet the changing needs of the business.

00:03:37.810 So, I've told you what SOLID is, what each of the principles are, and why you should follow them.

00:03:43.479 I also mentioned earlier that once you can identify what the problem is, there are already cures for what ails you.

00:03:49.209 Easy, right? Well, maybe not quite.

00:03:54.819 I said before that identifying our mistakes leads us to fixing them.

00:04:00.069 But just because you heard these rules doesn't mean you thoroughly understand them or that you can always identify when you're in violation of them.

00:04:06.639 I'd venture to guess that some of you in this audience have heard these design rules before, and yet you’re still here.

00:04:12.280 Why is that? Probably because you've struggled to recognize these patterns in real life.

00:04:18.579 That's what we're here to talk about today because I've had the same problem.

00:04:24.820 Some examples in this talk will be using Git on the command line, while others will be PRs on GitHub.

00:04:36.480 These are, of course, two separate tools, but I often use them together in conjunction to help me achieve my refactoring goals.

00:04:42.120 But these tools aren't usually thought of as the first things one might look at to help with refactoring.

00:04:56.520 You're probably wondering how or why I use them.

00:05:03.270 As I mentioned in the beginning, I work at GitHub.

00:05:09.590 Since this talk focuses a lot on GitHub, many people will assume that I formed this talk as a result of working there.

00:05:16.020 But I didn't. The beginnings of this talk started to form for me while working at Thoughtbot.

00:05:23.100 Thoughtbot is a software consultancy well known for helping companies write clean code.

00:05:28.800 They have a whole course on refactoring that they teach, and this is one of the primary things companies come to Thoughtbot for.

00:05:34.440 I started at Thoughtbot as an apprentice, and I was very green.

00:05:41.220 I had only worked at one company previously for about a year.

00:05:46.830 Although I had done all my homework, reading a ton of books on refactoring and clean code, I didn't have much experience putting that theory into practice.

00:05:53.430 I was writing code that I thought was clean, and it didn’t seem to me that I was violating any of the numerous object-oriented design rules I had memorized.

00:06:00.840 Yet, other developers were much more able to recognize patterns in the code that I just didn't see.

00:06:06.570 I would submit my code and think it looked great, or I'd review others’ code and say it was awesome.

00:06:13.050 Then later down the road, everything would grow into this mess.

00:06:19.680 Eventually, I realized that by only looking at the actual lines of code in my editor, I was doing myself a disservice.

00:06:25.470 You can't recognize design patterns in isolation; you need context.

00:06:30.630 It's like studying for a multiple-choice quiz when suddenly you're faced with essay questions.

00:06:37.750 You need to know not just what the rules are but the context in which they are expected to apply.

00:06:43.930 For me, Git is where all of my code lives; it provides that context.

00:06:49.930 Instead of looking directly at the code to identify potential design violations or areas that might need refactoring, I turned to Git.

00:06:56.410 I started looking for simple ways that Git might tell me when a class was in need of refactoring.

00:07:02.740 That's why the tips I shared today aren't some new crazy Git or GitHub feature that you've never seen before.

00:07:10.360 They involve looking at the tools we work with every day, like Git logs, PRs, and comments, but applying them differently.

00:07:17.560 Let's get back to the SOLID principles and see how we can use clues from Git to better understand them.

00:07:23.500 I'll explain each of the principles briefly as we go through.

00:07:29.590 But I want to clarify that the goal of this talk isn't to provide an exhaustive exploration of SOLID.

00:07:36.730 My aim is to use SOLID principles to exemplify how we might use Git in general to find design principle violations and refactoring opportunities.

00:07:43.810 Hopefully, this will inspire you to think about how you can use Git as a lens to view whatever software design principles are important to you.

00:07:53.890 As I mentioned before, the first SOLID principle is the Single Responsibility Principle, which means that a class should have only one reason to change.

00:08:00.460 When I hear this, I think, wait, I'm confused. I don't know about you, but my classes change for lots of reasons.

00:08:08.890 Maybe there's a new feature to add or I want to improve some bit of code.

00:08:15.070 When I was really trying to figure this out, I thought hard about why my class had changed recently.

00:08:21.130 Ideally, you would never write any code that violated the Single Responsibility Principle.

00:08:26.440 But if you're like me, you probably thought everything was going great when you wrote it.

00:08:31.810 Then, down the line, something starts to feel off, and you can't really pinpoint what it is.

00:08:38.469 Your code gets harder to change, and you start looking for ways to refactor.

00:08:44.260 This question is a great place to start: you have to really know the history of your classes to understand what might have gone wrong.

00:08:51.160 Fortunately, there’s a place specifically suited for that: the Git log for that class.

00:08:57.220 This is where I start when I want to understand if any of my classes might be violating some design rule.

00:09:02.920 Looking at the commit history for my engine class from my Tesla Motors company, something seems reasonable.

00:09:08.290 You don't have to squint and try to read it, but with a cursory look, it seems reasonable to me.

00:09:13.480 There are lots of small, well-named methods that do one thing.

00:09:18.880 Methods like 'engine_start' take a true or false input as to whether we're starting the engine remotely.

00:09:24.940 There's even a method for delaying the start so you can set a time, like 9:00 a.m. during winter.

00:09:30.940 Finally, an idle method decreases the power output of the engine while a hill access increases it.

00:09:36.700 Then we define some devices that are allowed to start the engine remotely, and at a glance, this all seems reasonable.

00:09:43.360 However, this class has been feeling a little off to me; I sense it needs to be refactored.

00:09:49.680 I can't quite pinpoint what's wrong with it.

00:09:56.680 Looking just at the code sometimes misses issues.

00:10:03.640 So, I will look again at the commit history for the class.

00:10:10.930 It seems like many changes were made relating to remote start functionality.

00:10:16.510 Upon reflection, I notice that there are two logical groups of commits.

00:10:22.600 The first group relates to the power output of the engine, such as increasing power or adjusting frequency.

00:10:30.070 The second group relates to remote start functionality.

00:10:35.470 Initially, I thought there were several reasons the class had changed.

00:10:41.500 Identifying one was impossible.

00:10:46.510 However, when I grouped these changes into larger categories, I realized the class has two reasons for change: the power output and the remote start.

00:10:53.920 Now, moving forward with refactoring seems much easier.

00:10:59.110 I know that I probably need to create a separate class for remote start, allowing me to keep responsibilities distinct.

00:11:05.500 Now, I won't have to worry about changes to remote start affecting how power output is determined by the engine.

00:11:11.140 For this approach to work, you need to write commit messages of a particular type.

00:11:18.279 You need well-written commit messages that document the significant changes in your code and why they occurred.

00:11:24.279 If you have numerous trivial commits, such as adding whitespace, that won't help.

00:11:30.880 Trivial changes add nothing to the overall story of your code.

00:11:36.640 You can write commits like this as you work, but be sure to rebase or squash merge your commits to keep your commit messages clear.

00:11:43.830 Embrace the full power of Git to help you refactor your code by using it in a way that makes things easy.

00:11:50.350 I've heard arguments from those who don't care much for nice comments.

00:11:56.079 They often say that they use PRs more as their documentation.

00:12:02.740 But there's no easy way to see all the PRs that touch a specific class.

00:12:08.380 With your PRs, you can't quickly gain context locally or in your editor like you can with the Git log.

00:12:14.410 In my opinion, that’s reason enough to use clear and concise commit messages.

00:12:20.829 The next SOLID principle is the Open-Closed Principle: a class should be open for extension but closed for modification.

00:12:26.350 Initially, this principle sounds simple, but I found it confusing.

00:12:31.240 How could you extend something without modifying it?

00:12:37.540 I read many books, blog posts, and articles on this, but it was difficult to apply.

00:12:43.840 I’d review my coworkers' PRs and try to guess if a class would need to change in the future.

00:12:52.070 However, I struggled to fully grasp the principle.

00:12:57.800 Let's take a look at some example code and PRs in GitHub to clarify this concept.

00:13:01.240 You don't have to read all this code, but here's a class that checks for engine health.

00:13:06.760 Its main concern is checking for any electrical shorts in the engine.

00:13:12.380 This PR gets submitted by one of my teammates, and it's my job to review it.

00:13:17.959 I think, 'Ship it! It's great!'

00:13:25.450 Then, a new requirement comes in: they want the engine to check that all the spark plugs are connected.

00:13:30.200 Someone submits a new PR that includes this functionality.

00:13:35.450 Next, another feature request comes in: they want to check the alternator.

00:13:42.560 My coworker adds their next feature in another PR.

00:13:49.850 Without needing to read the tiny code, we can notice something in the diffs.

00:13:55.960 This PR has more red and green than a Christmas tree.

00:14:03.980 We're adding features, and although the new lines are green, we also see a lot of red lines.

00:14:09.280 The phrase ‘a class should be open for extension but closed for modification’ means you shouldn’t have to modify existing code to add new functionality.

00:14:16.880 In GitHub context, this means that when adding features, there should be no red in the diffs.

00:14:24.180 If every time you're adding functionality, you also need to modify existing code, then you're violating the Open-Closed Principle.

00:14:31.939 Recognizing this was a mind-blowing moment for me.

00:14:38.430 Learning to look for this one indicator in my Git diffs helped me better understand the principle.

00:14:45.540 If you see red in your diffs when adding a feature, you may be violating the Open-Closed Principle.

00:14:51.629 Now that we know this, fixing the class to make it truly open for extension, but closed for modification becomes crucial for maintainability.

00:14:57.540 All that red in the diff indicates extra work someone must do to shoehorn the new feature in.

00:15:04.330 After recognizing the problem as an Open-Closed violation, you have the chance to fix that class.

00:15:10.560 This allows you to ensure that nobody has to do that extra work again in the future.

00:15:18.200 In the previous segment, I mentioned that clear commit messages are important.

00:15:24.240 Here's a caveat: consistent style matters too.

00:15:30.900 I used to think that having a set style guide was unnecessary, but I now appreciate its significance.

00:15:38.890 When I recognized how powerful Git can be for identifying refactoring opportunities, maintaining style became more important.

00:15:44.790 If, in addition to adding a feature, you're changing syntax or rearranging methods, it could hinder the review process.

00:15:51.640 This might complicate reviewing your code and make keeping track of more complex refactoring efforts difficult.

00:15:58.700 By keeping things easy and ensuring consistent style, you will make programming and reviewing code simpler.

00:16:03.900 The next principle we’ll cover is the Liskov Substitution Principle.

00:16:10.630 Every subclass or derived class should be substitutable for their base or parent class.

00:16:16.500 In other words, you should be able to call the same method on a subclass as you could on a parent and expect a compatible result.

00:16:23.880 If the superclass returns a string, the subclass should also handle that method and return a string.

00:16:31.360 In many languages, this is less of a concern because the language does some checking for you.

00:16:38.200 However, in Ruby, there are no such guarantees.

00:16:44.469 Despite our best intentions, I or someone else on my team could write code like this.

00:16:50.350 For instance, here we have a Viewable class with a method called turn_on_lights.

00:16:56.470 In its Car subclass, we only activate the headlights when this method is called.

00:17:03.540 For a Bike, we don't have any lights, so we raise a 'not implemented' error.

00:17:09.790 This triggers an LSP violation, since you can't call turn_on_lights on Bike and get a compatible result.

00:17:15.320 People forget these principles, and despite our best efforts, nothing is perfect.

00:17:22.150 I might end up pushing this code up to GitHub and opening a PR.

00:17:28.070 My code will be reviewed by a teammate, but they too are not perfect.

00:17:34.930 I don't want to rely on them to catch these issues.

00:17:40.760 Thanks to GitHub's webhooks, I can set up checks.

00:17:46.150 When opening the PR, I can receive an automatic comment from our style bot warning me about a potential LSP violation.

00:17:53.080 Automatic checks are fantastic for ensuring that clean code remains clean.

00:17:59.420 Consider writing your own style bot to check for design principles important to you.

00:18:06.130 There are tools like linters that can check for more general style guide violations.

00:18:14.150 Every day, more powerful tools emerge to analyze your code.

00:18:19.350 If you've never considered writing custom webhooks, I highly encourage it.

00:18:26.310 Internally at GitHub, we have checks for changes in database files to warn us of potential issues.

00:18:32.250 The possibilities are endless, and this can be a fantastic tool for refactoring.

00:18:38.440 Moreover, there are some things you could check for using these methods.

00:18:45.320 Although not every instance may be a clear LSP violation, they could indicate a code smell.

00:18:50.130 This is particularly true in dynamically typed languages like Ruby.

00:18:56.680 Invoking a method that relies on the type of an object can lead to violations.

00:19:03.360 Having a bot call attention to this can motivate the developer to rethink their implementation.

00:19:09.620 Reviewers would also appreciate this attention and offer suggestions.

00:19:16.000 Don’t depend solely on reviewers to spot such issues; they might not mention them.

00:19:24.000 Let your bots do the nagging; they can catch things that we miss.

00:19:30.120 When developing a bot, focus on helping improve your code.

00:19:36.440 As much as I would like to say I can always focus on code quality, we all have distractions.

00:19:42.790 Automate as much as you can to avoid LSP violations.

00:19:50.000 Doing so will facilitate code adaptation and growth.

00:19:56.310 The next principle is the Interface Segregation Principle.

00:20:02.250 No client should be forced to depend on methods it does not use.

00:20:06.630 The classic example of this is in typed languages like Java.

00:20:12.900 Let's switch to Java to illustrate this concept.

00:20:18.090 For example, let’s consider the Aircraft interface.

00:20:24.370 This specifies that anything implementing it must define certain methods.

00:20:29.990 An example would include takeoff and retracting landing gear.

00:20:36.160 Here we have an AirTrafficControl class, and it takes anything implementing the Aircraft interface.

00:20:42.310 It directs the aircraft to takeoff.

00:20:49.230 Now, imagine we have a Helicopter as another type of Aircraft.

00:20:55.790 While it needs to take off, it doesn't need to retract landing gear.

00:21:02.050 This situation illustrates an Interface Segregation violation.

00:21:07.850 AirTrafficControl relies on Aircrafts having all methods defined under the interface.

00:21:14.020 Many small, client-specific interfaces are preferred over one large general-purpose interface.

00:21:20.120 Let's see if we can spot ISP violations in non-typed languages like Ruby.

00:21:26.210 Taking another look at the AirTrafficControl class in Ruby, we initialize it by passing in all the aircrafts to deal with.

00:21:32.010 There's a method called today's aircraft that checks the flight dates.

00:21:38.830 However, our list of aircraft is required to be an ActiveRecord relation.

00:21:45.480 This implies AirTrafficControl relies on the ActiveRecord interface.

00:21:52.580 Suppose we wanted to pass in aircrafts from a CSV instead; the ActiveRecord dependency would restrict that.

00:21:58.680 That assumption can escalate quickly.

00:22:05.050 You would have to re-implement all ActiveRecord functionality in the CSV aircrafts class.

00:22:12.620 This is a violation of the spirit of ISP.

00:22:19.320 While using both small client-specific interfaces and large general-purpose interfaces can offer flexibility, balance is key.

00:22:25.920 Use Git to find that balance.

00:22:32.330 I would use Git logs to assess how often classes are changing.

00:22:38.360 What we're looking for here is whether the changes in our class correspond to any ISP violations.

00:22:44.270 Next, we move on to the Dependency Inversion Principle.

00:22:51.300 Entities must depend on abstractions, not concretions.

00:22:58.170 Here we have a Car class with methods like play_radio and silent.

00:23:04.550 Within it, we call FM radio.play, making FM radio a direct dependency.

00:23:10.230 This ties the car’s existence to FM radio.

00:23:17.050 To honor the Dependency Inversion Principle, we practice a concept called Dependency Injection.

00:23:23.200 Instead of calling FM radio directly, we inject it when initializing the class.

00:23:30.400 This allows us to decouple our code.

00:23:36.640 With a style bot, you could call attention to understanding this code.

00:23:42.600 It’s more reliable than relying on yourself or your coworkers to spot violations.

00:23:48.920 Automating checks ensures that at the very least, you must think through your code.

00:23:57.030 If you have numerous points in your code that couple classes to dependencies, utilize history and blame to prioritize fixes.

00:24:03.280 Adapting your code to inject dependencies makes it more flexible and maintainable.

00:24:09.320 You can now introduce cars equipped with all different types of radios, as long as they respond to the play method.

00:24:15.040 Recapping, here's how Git can help you refactor.

00:24:22.570 Look at Git logs to group commits into themes, recognizing when a class is addressing multiple responsibilities.

00:24:29.490 Keep an eye out for red in diffs when adding functionality, as this could signify violations of the Open-Closed Principle.

00:24:35.840 Integrate tools like automatic style linters with Git to facilitate code quality.

00:24:42.300 Make use of the Git log and blame to discover how frequently classes or methods change, identifying the most significant offenders.

00:24:48.500 By adhering to these tips, you can refactor and reach SOLID code.

00:24:55.050 Your code will be more flexible and maintainable, allowing for easier adaptations.

00:25:03.200 For all of you familiar with the SOLID principles but struggling to apply them, these tips will help you identify violations and see their context in your work.

00:25:11.050 Thank you!

Git Driven Refactoring

Key Points Discussed:

Examples and Illustrations:

Conclusion and Takeaways: