Branch In Time

by Tekin Suleyman

The video titled "Branch In Time" presented by Tekin Suleyman at RubyConf 2018 explores the significance of maintaining a useful revision history in software development, particularly in the context of Ruby programming. The key theme is how a well-crafted history of code changes can enhance maintainability, paralleling the importance of good coding practices.

Key points covered include:

- Importance of Revision History: A clear revision history is crucial for understanding code changes over time, making it easier for developers to grasp not just 'what' changed but 'why' it changed. This helps in navigating legacy code and ensuring the software remains adaptable.
- Case Study - Seema's Task: The speaker details Seema's challenge at Dr. Oz, where she tried to understand why an in-memory sorting method was implemented in a decade-old application. Using Git commands like 'git blame' and 'git log', Seema navigated through the commit history to uncover the reasoning behind the code, highlighting how revision history aids in understanding the codebase.
- Historical Developer Perspectives: The video contrasts Seema's modern struggles with the earlier decisions made by Josie in a startup environment. Josie's quick fixes and commit decisions under time pressure exemplify how the evolution of code often leads to messy histories that can confuse future maintainers.
- Best Practices for Commit Histories: Tekin outlines strategies for creating effective commit histories. These include writing detailed commit messages focusing on the 'why' behind changes, ensuring commits are atomic and context-rich, and utilizing Git's interactive rebase feature to refine commit history.
- Collaborative Learning: The speaker emphasizes the importance of mentoring others on good Git practices, encouraging experienced developers to assist less knowledgeable peers in crafting thorough and useful commit histories.

In conclusion, the talk asserts that while writing good code is essential, understanding the deeper context of software changes through well-maintained revision histories is equally important for long-lasting maintainability. This video serves as a guide for developers to enhance their Git fluency and improve collaboration within their teams.

00:00:15.500 Meet Seema. Seema is an engineer at Dr. Oz, a company that provides appointment booking software for medical professionals. Dr. Oz was founded in 2008 and has grown from a tiny startup to a company employing nearly 100 people. The app itself is a decade-old majestic monolith, with code dating back to Rails 1.0, still found in its ancient hull. The team has changed considerably over the years, with only the CTO remaining from those who wrote the early lines of code. Seema has been in the job for a couple of months now and feels like she has started to get her bearings around the codebase.

00:00:44.270 Today, she has started a new task: updating a page in the app where doctors can view the patients they previously had appointments with. Currently, the patients are sorted alphabetically. Seema's task is to update the page so that doctors can also sort the patients by their last appointment date. This way, they can see who hasn't been in for a checkup for a while. It sounds straightforward enough, but while looking at the code, something catches Seema's eye. There’s a method called 'sorted_patients' that returns the patients sorted by their names. What she finds surprising is that it’s doing so using Ruby's sort method in memory, rather than as part of the database query. This doesn't seem very efficient.

00:01:20.290 Seema makes a quick change to try sorting the patients in the database instead and runs the tests to see if it breaks anything. The tests come back green, suggesting that the in-memory sort is probably unnecessary, at least assuming the test coverage is good. Eager to remove the in-memory sort, Seema wants to understand why someone might have implemented it in the first place to ensure there aren't any unintended consequences. She starts by checking the model, looking for clues. The patients' association is loaded through the appointments association, and everything else looks as she'd expect.

00:01:44.880 Satisfied she's explored the code enough, she knows exactly what to do next. In her last job, Seema was fortunate enough to work with a couple of wizened old developers who taught her the mystic and ancient arts of Git foo. She learned powerful techniques for searching through revision histories to discover how code got to be the way it is. She starts with a basic Git foo technique: 'git blame'. Git blame will reveal the revisions and authors that last edited each line of code. Seema runs the command and picks out the line she’s interested in. Git is telling her that the author was someone called Josie, who I think doesn’t work here anymore. Seema thinks that if she did, she might have asked Josie directly why she used an in-memory sort. However, given the change was done almost a decade ago, she probably wouldn’t remember.

00:02:12.959 She takes the revision SHA for the line she's interested in and passes it to 'git log' so she can see the commit message. She also includes the patch option so that Git will show her the full diff for the change, along with the commit message, providing maximum context. Looking forward to finding out the reason for the in-memory sort, Seema runs the command and examines the output. However, being the fastidious type, she is left rather bemused when she sees that Josie corrected a typo in the commit.

00:02:39.780 As for the mystery of the in-memory sort, she’s none the wiser. Unperturbed, Seema cracks her knuckles and prepares to use a more advanced Git foo technique. It's time to break out the 'pickaxe'. 'git log -S', also known as the pickaxe, allows us to search through commit histories and find all the commits that contain a particular snippet of code. Seema is going to use the pickaxe to find the very first commit that introduced the 'sorted_patients' method. She calls the command with the method name as a search parameter, again including the patch option so she can see the full diff. She also includes the reverse option so that Git will return the commits in reverse chronological order, as she hopes the first commit that used 'sorted_patients' will be right at the top.

00:03:14.230 Hopeful this will finally solve the mystery, she runs the command and inspects the output. Unfortunately, it looks like Josie had a change of heart about the method name. I guess 'sorted_patients' is probably more intention-revealing than 'load_patients', but Seema is still no closer to solving the mystery. Not a problem, thinks Seema; she can rerun the search but this time using the original method name. So she calls the pickaxe again, this time searching for 'load_patients'. Perhaps this will finally solve the mystery.

00:03:40.500 As she continues her search, it feels like she's getting somewhere. Up until this point, the code was performing a sort in the database query, and this is the very first commit that used an in-memory sort. The commit message mentions something about an ordering bug, so it looks like it was an intentional choice. Still, there are no clues as to what caused the bug or why the in-memory sort would have fixed it. Seema decides it's time to switch tactics and goes looking for the original pull request for the change. She finds the commit on GitHub and clicks the unassuming link that will take her to the pull request for the commit.

00:04:04.330 Seema's hope had been that the description would give her a bit more context for the change, but all she sees is a link to Pivotal Tracker. She wasn’t aware that the company used Pivotal Tracker, so she asks a colleague how she might access the project. Unfortunately, it turns out the company archived the project when they switched to Trello, and when the subscription lapsed, so did access to the project. Back on GitHub, Seema scrolls through the rest of the diff looking for more clues.

00:04:41.620 She finds the commit that adds some tests, and sure enough, they're verifying that the patients are displayed in alphabetical order. However, there's nothing else in the diff giving her any indication of why the in-memory sort was used. By this point, Seema is fresh out of ideas. Her search to discover the reason for the in-memory sort has come up dry. Although the test suite is giving her some confidence that removing it is probably fine, she’s still feeling a little uneasy. She decides to proceed with caution while she continues her work and look for more clues.

00:05:06.960 Okay, let’s find out how we got here. Meet Josie. Josie is an engineer at Doc's R Us, a startup that’s building appointment booking software for medical professionals. Josie was pretty much one of the first hires by the company after they secured funding and loves the fast pace of startup life. However, today she’s having a bad day. She got up late and managed to spill her carefully crafted single-origin pour-over all over herself when she rushed out the door. Not only is she late and covered in coffee, but she’s also severely under-caffeinated.

00:05:31.740 The day before, she had been working on a strange bug where patient records were being displayed in the wrong order. They needed a fix pretty urgently because she had a big demo coming up with a potential new client. After a bit of digging, Josie figured out what the problem was: the patients' association was being loaded through the appointments association. However, the appointments association had a default ordering on it, ordering the appointments by date.

00:06:00.980 What this meant was every time the patients' association was called, it would automatically inherit that ordering from the appointments, and any additional order calls would get appended to the end. So instead of returning the patients ordered by their names, they were, in fact, being ordered by their appointment date and then their name. The obvious fix would have been to remove this default ordering. But when Josie tried that, there were a heap of test failures.

00:06:22.100 It turned out quite a lot of the code was relying on the appointments being returned in date order. Realizing it was going to take a while to unpick all the failures, she decided to put together a quick fix for the bug and come back to remove the default ordering later when she had more time. So she introduces methods in the controller to sort the records after they’ve been loaded. She later had a change of heart about the method name and also added some tests.

00:06:47.180 Here’s how the commit history looks. This morning, her plan had been to tidy up the history before creating a pull request, but today she’s feeling cranky and just wants to see the back of this bug so she can move on to something more interesting. So instead, she throws caution to the wind, pushes the code to GitHub, and creates a pull request. She uses a link to the Pivotal Tracker story for the bug description because all the details are there already and doesn’t see much point in repeating the information.

00:07:07.850 A few moments later, she gets a notification from the CI server. It looks like the build is broken on her branch. She discovers she failed to update an integration test that also needed updating. So, she adds an additional commit to fix the test, and then another when a co-worker points out a typo. Normally, her co-worker would have pulled her up on such a messy commit history, but having seen the unhappy state she was in that morning, perhaps they thought it would be kinder to let it slide this time.

00:07:39.980 With the build green, the pull request is approved and the bug fix is shipped just in time for the big demo. Happy that she squashed another bug, Josie moves on to the next task, but not before adding something to the backlog to remove the default ordering so that it doesn’t catch anybody else out in the future. Okay, so the eagle-eyed among you may be wondering at this point why Josie didn't just use the reorder method instead, as that would have replaced the existing order clause.

00:08:03.420 Well, that’s because I’ve contrived Josie’s timeline so it happened in 2010, which conveniently for me is before Rails’ ActiveRecord added that functionality. Now, let's get back to the story. Meet Josie again. Josie is an engineer at Doc's R Us, a startup building appointment booking software for medical professionals. She was one of the first hires by the company when they secured funding, and she loves the fast pace of startup life.

00:08:35.070 Today’s got off to a great start. Josie got up early to beat the rush hour and enjoyed reading a really interesting blog post about revision histories while sipping her delicious, lovingly crafted single-origin pour over. Yesterday, Josie had been working on a bug where patient records were being displayed in the wrong order, and by the end of the day, she had put together a fix. However, the commit history was a bit of a mess and her plan had been to tidy up this morning.

00:08:48.880 Hence her choice of reading on the way in. She decides to commit where she renamed the method, which isn’t going to be much use to anyone in the future and at best would prove a distraction for someone trying to understand the nature of the change. She also decides the history would be more focused if the bug fix and the test for the bug fix were part of the same commit.

00:09:06.170 To tidy up the history, Josie plans to use Git's interactive rebase tool. Interactive rebase makes it possible to revise our commit histories by letting us edit, squash, reorder, and reword commits. She tells Git she wants to interactively rebase the last three commits, and when Git presents her with those three commits, she marks the rename method commit and the add tests commit to be fixed up, which essentially means squashing them into the first commit.

00:09:24.380 Additionally, she marks the first commit as reword so she can write a more detailed commit message for it. In the commit message itself, she ensures to explain the nature of the bug and why she chose to fix it that way. At the bottom, she includes a little note about the work planned to remove the default ordering. This commit message also serves as a perfect title and description for the pull request, saving her the trouble of having to write a new one.

00:09:44.259 A short while later, she receives a notification from CI, telling her the build is broken on her branch. It appears she missed an integration test that also needed updating, so she updates the test and makes sure it passes before staging. The change is ready to be committed, but when she runs 'git commit', she includes the amend option. Instead of creating a brand-new commit, Git will amend the existing one, keeping all the changes related to the bug fix on a single commit.

00:10:01.800 Since she’s happy with the commit message and doesn’t need to make any changes to it, she uses the no-edit option, and Git amends the commit without prompting her. However, because she’s made a change to a local commit that’s already on GitHub, she needs to force push to overwrite what’s there. To be safe, she does so using force-with-lease, that way Git will warn her in the unlikely event that somebody else has made a change to the branch in the meantime.

00:10:27.750 With the build green and the typo corrected, the pull request is approved and the bug fixes are shipped just in time for the big demo. Happy to have squashed another bug, Josie moves on to the next task but not before adding something to the backlog to remove the default ordering so it doesn't catch anybody else out in the future. Meanwhile, back in the present day, Seema has started a new task and is puzzling over why some code is sorting patients in memory rather than as part of the database query.

00:10:52.290 She wants to know why and decides to use some Git foo. She runs 'git blame' to identify the revision for the line she's interested in and passes it to 'git log' along with the patch option so she can see the full diff as well as a commit message. Upon reading the message, Seema notes that this was a workaround for the default ordering on the appointments association.

00:11:24.370 She also notices the commit message mentions something about removing the default ordering altogether, and sure enough, when she checks the model, it's gone. Seema speculates that whoever removed it must have forgotten about the in-memory sort. Oh well, these things do happen. With the mystery solved, Seema feels confident that she can remove the in-memory sort and carry on with her work.

00:11:53.480 Now, as developers, we do many things to try and keep our code maintainable. We carefully think about the names for our objects and methods; we write and maintain automated tests; we try to create good abstractions; and we refactor. We make deliberate efforts because we want to keep our code easy to understand and easy to change. But here's the thing: our software is so much more than just code. At its best, code can clearly articulate what our software is doing, but if we want to understand the deeper why of our software in the code, it's often key that we can understand how it got to be where it is today.

00:12:26.830 We write modern software iteratively. Startups pivot, requirements change, bugs are found and hopefully squashed. We're constantly course correcting the code to keep up with our ever-growing and shifting understanding of what our software needs to do. Along the way, we make decisions and trade-offs, the consequences of which are felt long after they're made. And as we do so, we build up a kind of institutional knowledge that defines our software in a way that the code can't express.

00:12:53.460 It's our ability to grasp this knowledge that's as important to the maintainability of our code as it is to keep the code in good shape. Peter Naur spoke to this in a paper he wrote in 1985 called 'Programming: A Theory of Building'. In it, he proposes that programming isn't actually about the production of executable code, but it's actually the process by which programmers build up their mental model, their theory of how the software needs to work.

00:13:15.540 So, for Peter, the code itself was merely a secondary artifact. He goes even further and states that for the software to remain viable and maintainable, the programmers that hold the knowledge in their heads need to be around. The program effectively dies when the team disbands. Now, it’s rare that an entire team is disbanded, but just like our software, our teams are not static either. New team members join and quickly need to get up to speed with this knowledge to become effective, while long-standing members leave, taking their hard-won knowledge with them.

00:13:34.540 The power of the revision history is that it gives us a way to capture this knowledge right there alongside the code as we change it, in a way that’s both searchable and won’t go out of date or age with the world. By putting together a revision history, every line of code is documented; every change is explained. As the codebase grows and ages, the value of this revision history grows with it, but only if we take the time to shape a useful history in the first place.

00:14:10.700 Now, I imagine there are many of you in this audience who find everything I’ve said today self-evident. You’re confident reshaping your histories and regularly write novel-length commit messages. Equally, I imagine there will be quite a few of you in this audience for which, as great as this sounds in theory, the prospect of shaping your histories can feel like a bit of an intimidating prospect. Let's face it: it's called Git for a reason.

00:14:39.620 Whilst Git is extremely powerful and versatile as a tool, its primary interface, with its many commands and esoteric option flags, is not particularly user-friendly. Sites like Git WTF exist for a reason. Unfortunately, there aren’t really any silver bullets here. Just like writing good automated tests, doing this stuff well takes patience and practice. My aim with this talk was to convince you that the effort is worth it, and I’d like to finish by sharing a few simple tips that helped me on my journey to creating more useful histories.

00:15:09.060 Now, there's only so much I can cover in the time I have today, but I’ve published a blog post with links to more in-depth resources, so if you want to learn more about this subject, go there.

00:15:31.890 So first up, make sure you’re set up for writing good commit messages. For me, this meant getting out of the habit of committing with -m. The command-line environment doesn't encourage writing detailed messages. Instead, I recommend configuring Git so it knows your editor of choice. That way, you'll find yourself in a friendly and familiar environment, and you’re much more likely to put some detail in your commit messages.

00:15:52.480 I’d also recommend turning on Git’s verbose mode. With verbose mode on, you’ll get to see the full diff for the change right there in your editor as you write the message. This is a great opportunity to review the change and remind yourself as you write the message. This is also where I'll often spot something in the diff that I think actually belongs in another commit, which gives me a chance to back out, restage the changes, and go again.

00:16:14.660 Or perhaps, in the process of actually explaining the change, I'm looking at the code and I’ll say: actually, there’s another approach I could take here. When it comes to the commit messages themselves, focus on capturing the why and not just the what. Hopefully most of the what should be clear from looking at the diff. Instead, use your message to capture the kind of context that will be lost otherwise once the change is made.

00:16:41.530 I’ve got a simple example here based on an idea I have. When writing a commit message, I like to put myself in the shoes of a future developer trying to understand why I’ve done what I’ve done. I answer the questions they might have right there in the message. It’s not even like some hypothetical person; if you do code reviews, it’s literally the person that’s going to be reviewing your PR. This is a simple example to illustrate the point.

00:17:06.160 I was doing some work on a project recently, refactoring some partials, and I came across one that didn’t appear to be referenced anywhere in the app, except for this one spec that was testing the PDF rendering code. I wanted to know why it was used there, so I could be sure it’d be okay to remove it. Luckily, the developer who wrote the test was around, and I could ask them. They confirmed that they chose it because it was basically plain HTML and made the test easier to set up. That’s an example of something you could capture right there in the message. Then if that person had left, the information would still be available.

00:17:34.440 Now, this is a useful example to illustrate the point, but it's not a great one. The information we're capturing here is fairly low value; it relates to the specific mechanics of the test. I could have reasonably argued that it would be safe to remove that partial. Far more interesting is capturing information that relates to the business domain for your project—the sort of stuff that can't easily be figured out by looking at the code alone. Capturing that sort of information in your commit message is like burying little knowledge nuggets of treasure for future developers.

00:18:03.540 Another little mind hack I like to use is if I find myself in a situation where I wonder if I really want to write a comment in the code, and there's not an obvious way to refactor things to make it clear without a comment, then maybe that comment belongs in the commit message instead.

00:18:20.060 Thirdly, think carefully about the shape of each commit. In the story for Josie, this meant collapsing everything down into one commit. But the message you should take from that is that everything should not just be one big commit. Instead, focus on creating small, focused, and atomic commits that just do one thing.

00:18:44.180 Joel Chippendale gave a great talk at a local meetup back in the UK called 'Telling Stories of Your Git Commit Messages'. In that talk, he discusses this idea of a minimum viable commit. If you find yourself using the word 'and' in a commit message, perhaps there's actually another commit trying to break out. I think of each commit as almost like a mini pull request that gradually builds on the work you're trying to deliver. Thinking about the shape of your commits as you go will make your life a lot easier.

00:19:08.670 Instead of rebasing everything right at the end, your best friend here is the good old patch option. When using 'git add' to stage your changes, it’s like a little mini text adventure for staging changes. It offers you each chunk of the diff in turn and asks you whether you want to stage it or not. Everywhere in this talk where I’ve said the patch option, by the way, you can use 'bash -p' for short.

00:19:32.790 Another little mind hack I like to use is a great quote from Kent Beck, but as it relates to revision history: if I'm working on some code and adding new functionality and decide there's some refactoring work that's going to make that easier, then I’ll try and break that down into two separate stages. I'll first perform a commit that does the refactoring without changing any behavior, and then a second commit that actually changes the behavior.

00:19:52.560 This is going to make each commit more focused and is also going to be much more helpful for the person reviewing your code. Next up, get used to treating your commits as mutable—things that, until the point they’re merged into your main branch, you're free to chop, change, and reorganize as you see fit. Of course, if you're collaborating with others on a branch, you've got to coordinate any rebasing carefully so as not to cause conflicts. A great place to start is the amend option, which allows you to edit the most recent commit, adding changes, removing changes, or correcting typos.

00:20:23.420 If you want to take things a level further and you want to start tweaking older commits, look up using the 'fixup' option to create fixed-up commits and then automatically rebase them down with auto-squash. Don’t fear the rebase. It can be intimidating at first, but with practice, it becomes an invaluable tool for revising your histories. Honestly, once you get good at it, you’ll feel like a superhero using it, and if you find yourself in a pickle, you can always back out with 'git rebase --abort'.

00:20:53.170 Finally, spending time searching through your revision histories is a fantastic way to build the instincts for the sorts of histories that will be useful for other developers. This might sound slightly controversial, but I’m going to recommend trying to use 'git blame' a bit less. It’s a limited tool as it will only ever show you the most recent revisions for any one given line.

00:21:19.160 Instead, get used to using the pickaxe. It’s far more powerful, as it can show you the full history of a particular snippet of code, not only across multiple commits but also across multiple files. And if you do find yourself really needing to identify the most recent revision on a line, you might consider using 'git annotate' instead. It’s basically the same thing but has slightly less accusatory language.

00:21:45.020 As I said before, unfortunately, there aren’t really any silver bullets, but hopefully those tips will help you on your journey to creating better constructed revision histories. Everyone in this room exists somewhere on a spectrum of Git fluency, but we all start in the same place.

00:22:05.620 This history I put together back in 2012 isn’t going to be much help to somebody trying to understand the nature of the work I was doing at the time. Take a minute to admire them; it gets worse but less anonymous the further down you get! You know, switcheroo: I bet nobody has ever gotten that into a commit message since.

00:22:31.470 Since then, I’ve been fortunate enough to work with some fantastic developers who’ve helped me understand the benefits of putting together good revision histories and also helped me learn the skills to do so in the first place. It’s because of their help and patience that I’m able to stand here sharing this with you today.

00:23:04.510 So, my final tip is for those of you at the other end of the Git fluency spectrum. This stuff can be intimidating. If you work with someone whose commits suggest that maybe they don’t fully appreciate the value in constructing a useful history or maybe they don’t have the skills to do so, help them. And I don’t mean leave snarky comments on their pull requests; I mean actually sit down with them, pair with them, show them how they can revise their histories into something more useful and teach them the benefits of doing so.

00:23:31.750 If everyone in this room, those who’ve already mastered this stuff, helps their co-workers do the same, we’d all be better off. Thank you.