Talks
A Branch in Time (a story about revision histories)
Summarized using AI

A Branch in Time (a story about revision histories)

by Tekin Suleyman

In the talk titled "A Branch in Time," Tekin Suleyman explores the crucial role of revision histories in software maintainability, drawing parallels to the film "Sliding Doors" by highlighting two distinct realities shaped by code management practices. The presentation is anchored around the case of Seema, an engineer at Doc's R Us, who faces challenges while working with a complex and outdated codebase. Through her experience of updating a sorting feature, the talk delves into the following key points:

  • Importance of Revision History: Revision history plays a vital role in facilitating communication and understanding within development teams. It captures the evolution of code while providing insights into design decisions and trade-offs made throughout the development process.

  • Seema's Journey: Seema's attempt to understand a specific piece of code leads her through various Git techniques, illustrating the practical implications of having a thorough revision history, as she uncovers the motivations behind existing code.

  • Technical Techniques Explored: The talk motivates developers to leverage Git commands like 'git blame' and 'git log -S' to gain insights into historical code changes, emphasizing the significance of discussing past decisions to aid current development work.

  • Josie's Backstory: The video juxtaposes Seema's experience with Josie's, a past developer who made quick decisions due to pressure, resulting in a workaround that complicated future maintenance. This narrative showcases the consequences of not thoroughly documenting choices and reinforcing the benefits of a detailed revision history.

  • Long-Term Value and Best Practices: The speaker emphasizes the long-term advantages of maintaining a clear and detailed commit history. Best practices discussed include writing detailed commit messages, integrating Git with familiar editors, and utilizing verbose mode to encourage thoughtful documentation during coding.

  • Cultural Understanding: The talk also highlights the importance of institutional knowledge within teams and how it can be supported through well-maintained revision histories, illustrating that code alone cannot encapsulate the knowledge needed for software evolution.

In conclusion, Tekin's discussion advocates for investing time and efforts into mastering version control practices, as they support not only daily development tasks but also the future maintainability of software projects, ensuring that historical insights remain accessible and beneficial to current and future developers.

00:00:00.000 I wanted to introduce our next speaker for the afternoon. Tekin is going to be talking to us about revision histories, specifically 'A Branch in Time.' Tekin is based in Manchester and runs the North West Ruby User Group for the surrounding areas of the UK. He has been a freelancer and contractor for most of his career, enjoying the collaborative process of working with teams to build applications that meet real needs. His talk will focus broadly on software maintainability from a development perspective and discuss the importance of revision history in facilitating communication within teams. This is his third time giving this talk, having previously presented at Brighton Ruby and RubyConf in Los Angeles. On this, his third trip to Australia, he also recently adopted a cat named Cooper, whose adorable, chubby face you can see on his Instagram and possibly his Twitter. Please welcome his dad, Tekin!
00:01:10.000 The lights don’t really help for this talk. Meet Seema, an engineer working at Doc’s R Us, a company that provides appointment booking software for medical professionals. Founded ten years ago, Doc’s R Us has grown from a tiny startup to a successful enterprise employing nearly 100 people. The application itself is a decade-old monolith, with code dating back to Rails 1 still found in its structure. The team has changed quite a bit over the years, with only the CTO remaining from those who initially wrote the early lines of code. Seema, who has been in her role for a few months, feels like she is starting to grasp the codebase. Today, she is tasked with updating a page in the app that allows doctors to view the patients they have had appointments with. Currently, the patients are sorted alphabetically, but Seema’s task is to implement a feature that allows sorting by appointment date as well.
00:02:46.000 This task sounds straightforward enough, but as Seema reviews the code, something catches her eye. She finds a method in the app that returns the patients sorted by their names. What surprises her is that instead of performing the sort as part of a database query, it is done using Ruby's sort method in memory. This seems inefficient, so Seema makes a quick change to perform the sort using an ActiveRecord query instead. After running the test suite to see if anything breaks, all tests come back green, suggesting that the in-memory sort was likely unnecessary—assuming the test coverage is good. Seema is eager to remove the in-memory sort as part of her work. However, before she does so, she wants to understand why it was added in the first place to avoid any unintended consequences.
00:03:38.000 To investigate, she starts by checking the models for clues. The patients' association is loaded via the appointments association, which all looks as she expects. Satisfied that she has explored the code enough, Seema knows what to do next. In her last job, she was fortunate to work with a couple of experienced developers who taught her powerful techniques for searching through revision histories to uncover the reasons behind the code's evolution. She starts with a fundamental Git technique: using 'git blame,' which reveals the revisions and authors that last modified each line of code. Seema identifies the line she is interested in and discovers it was last edited by someone named Josie, who she suspects no longer works there.
00:04:18.000 Had Josie still been around, Seema might have asked her directly why the in-memory sort was used. However, given that the change was made almost a decade ago, Josie likely wouldn't remember the details. Seema takes the revision shot of the line and passes it to 'git show' to view the commit message and the full diff for the change. As she examines the output, she appreciates that Josie corrected a typo. Unfortunately, regarding the mystery of the in-memory sort, Seema is still in the dark. Undeterred, she prepares to try a more advanced Git technique using 'git log -S,' also known as the pickaxe command, which allows her to search through revision history for every commit containing a specific snippet of code.
00:05:13.000 Using the pickaxe, Seema aims to find the first commit that introduced the sorted patients method. She calls the method with the method name as a search parameter, including the patch option so that Git will show her the diff along with the commit message. Additionally, she adds a reverse option, which instructs Git to display the commits in reverse chronological order, allowing her to see the first commit that introduced the method right at the top. Hopeful that this will solve the mystery, she executes the command and inspects the output. To her dismay, it leads to another dead end. It appears that Josie had a change of heart regarding the method name. The method is now called ‘load_patients,’ which seems more intentional, but Seema feels no closer to solving the mystery.
00:06:41.000 Not discouraged, Seema resolves to rerun the search, this time with the original method name. She calls the pickaxe again, searching for 'load_patients,' hopeful that this will unravel the mystery. As she progresses, she feels like she's getting somewhere; it appears that sorting was originally done in ActiveRecord, and the very first commit that introduced the in-memory sort mentions a bug regarding ordering. However, the commit offers no clues about the nature of the bug or why the new method would have resolved it. Seema decides it is time to pivot her approach and looks for the original pull request associated with the change.
00:07:41.000 She finds the commit on GitHub and clicks the unassuming link that takes her to the pull request. She had hoped the pull request's description would provide additional context, but all she finds is a link to a Pivotal Tracker story. Unfamiliar with Pivotal Tracker, she asks a co-worker how she might access the project, only to discover that the Pivotal project was archived when the company transitioned to Trello. Thus, when the subscription lapsed, so did access to the project. Back on GitHub, Seema meticulously scrolls through the rest of the pull request and finds a commit that adds some tests. Indeed, these tests verify that the patients are being listed alphabetically, but once again, there is nothing in the pull request to explain why the in-memory sort was employed.
00:09:21.000 At this point, Seema feels like she’s out of ideas; her search for the reason behind the in-memory sort has turned up fruitless. While the test suite provides some assurance that removing the in-memory sort is likely safe, she still feels a little uneasy. Deciding to proceed with caution, she keeps her eyes peeled for any additional clues as she continues her work.
00:10:00.000 Now, let’s rewind a bit and meet Josie. Josie is also an engineer at Doc's R Us, one of the inaugural engineers brought on when the company secured funding. She adores the fast-paced life of a startup. However, this particular day has not begun well for her; she overslept and managed to spill her carefully crafted single-origin pour-over coffee all over herself while rushing out the door. To make matters worse, she is now late and severely under-caffeinated. The day before, Josie had been tackling a bug where patient records were listed in the wrong order. She needed a quick fix since a significant presentation with a potential new client was looming.
00:11:03.000 Upon examining the controller code, she believed the patient records should have been returned correctly. After a bit of digging, she uncovered the issue: the patients' association was being loaded through the appointments association, which already had a default ordering based on appointment dates. This meant whenever the patients' association was invoked, the result would inherit the order clause from appointments, causing them to return ordered by appointment date instead of their names.
00:12:00.000 The obvious solution would have been to remove the default ordering altogether, but after trying that, Josie faced a flurry of test failures. Much of the app’s code depended on the appointments being ordered by date. Realizing it would take substantial time to untangle all those failures, and needing a swift fix for her impending demo, Josie made a quick decision: she would implement a workaround that allowed patient sorting after they had been loaded, promising to revisit and address the default ordering later.
00:13:34.000 Thus, she introduced a method to sort patients post-load and changed her mind about the method name, ultimately adding a commit that included tests. Nevertheless, her intention was to tidy up the commit history before creating a pull request. However, feeling frustrated, she pushed her code to GitHub, using a link to the Pivotal Tracker story as a PR description since all relevant details were there.
00:14:41.000 Shortly thereafter, she received a notification indicating that the build was broken on her branch. It turned out there was an integration test she had forgotten to update. After addressing that with another commit, she resolved a few typos before co-workers approved the PR, enabling her bug fix to be deployed just in time for the demo. Satisfied that she had tackled another bug, Josie moved on to her next task but not before noting to herself to return and remove the default ordering in the future to avoid any potential issues.
00:15:56.000 You might be wondering why Josie didn't use the reorder method, which would have replaced the existing order clause defined on the relation. That's because I've constructed Josie's timeline to occur in 2010, conveniently before 'reorder' was introduced to ActiveRecord. Let's return to the present, where Seema is puzzling over why some code is sorting patients in memory rather than as part of the database query again. Eager to learn the rationale behind this design decision, she uses Git tools and discovers that the original implementation was a workaround to circumvent a default ordering on the appointments association.
00:17:15.000 Reading the commit messages, Seema realizes that the in-memory sort was a temporary fix until they could remove the default ordering altogether. Unfortunately, after checking, it seems the ordering has indeed been removed. Determined to clean up the task, she feels confident that she can proceed to remove the in-memory sort and continue working on her project.
00:18:18.000 As developers, we implement many strategies to maintain our code effectively. We meticulously consider our objects' and methods' names, maintain automated tests, and create solid abstractions. We also engage in refactoring, striving to keep our code easy to understand and modify. However, it's crucial to remember that software encompasses so much more than mere code. At its best, code articulates our software's functionalities clearly, yet comprehending the deeper reasons behind our software's behavior lies in our ability to consult the revision history.
00:19:09.000 In a considered and iterative manner, we often pivot our understanding of requirements and resolve newfound bugs. While making these countless decisions and trade-offs, we build an institutional knowledge that defines our software uniquely, something the code itself might not express. Seema stresses that grasping this ever-evolving insight is paramount not only for maintainability but also for the longevity of good code.
00:20:37.000 In a paper published in 1985, Peter Naur described programming as not just producing executable code but as building a mental model of how software ought to function. He asserted that the code produced becomes merely a secondary artifact. For software to remain maintainable, the team that holds this understanding must remain intact. Yet, it's rare for an entire team to disband outright; like the code, our teams are dynamic. New members will join, requiring them to gain knowledge quickly while seasoned members will leave, taking their hard-won insights with them.
00:21:46.000 Seema passionately highlights the power of revision history as a mechanism to capture and embed this knowledge alongside code changes, providing a searchable resource that won’t go stale over time. With a well-maintained revision history, every code change is documented, explaining each alteration and growing more valuable as the codebase ages. Contrastingly, a poorly maintained history represents a burden that is virtually impossible to clear. In contrast, a solid history can yield returns on investment far into the future.
00:23:08.000 For many in this audience, much of what has been discussed today will be apparent. You've likely developed confidence in revising and creating useful revision histories. However, I also recognize that for some, this area might seem daunting. This challenge derives from Git's powerful yet complex command-line interface, which isn't particularly user-friendly. Unfortunately, there are no silver bullets; mastering Git and constructing effective revision histories requires patience and practice. My objective through this talk has been to reinforce the value of investing in this skill.
00:25:00.000 In closing, I want to share several simple tips that can aid you in crafting improved revision histories. First, ensure you are set up for success with good commit messages. For me, that meant moving away from the habit of committing directly in the command line. Instead, I configured my editor of choice to integrate with Git, providing a familiar environment conducive to writing detailed commit messages. I also suggest enabling Git's verbose mode so that the full diff of the changes is displayed in your editor, allowing for thorough review as you construct your commit messages.
Explore all talks recorded at RubyConf AU 2019
+10