RailsConf 2022

Git your PR accepted. Rebase your changes like a pro

Git your PR accepted. Rebase your changes like a pro

by Daniel Magliola

In his talk at RailsConf 2022, Daniel Magliola introduces strategies for effectively managing Git pull requests (PRs) using the rebase function to create a clear and coherent commit history. This is particularly useful when contributing to open-source projects where messy commits can complicate the review process. Magliola emphasizes the importance of structuring commits to tell a story, making it easier for reviewers to understand the thought process and changes made during development.

Key points discussed in the video include:
- The Importance of Commit Cleanliness: Large PRs, while sometimes necessary, can deter maintainers if not well-organized. By breaking down commits into manageable, meaningful units, the review process becomes significantly easier.
- Understanding Git Internals: Magliola explains how Git tracks changes through commits and branches, allowing for flexibility in manipulating commits. He highlights that commits are essentially differences (diffs) from the previous state, which can be reapplied in different contexts.
- Using Interactive Rebase: He explains how the interactive rebase functionality allows developers to reorder, squash, and edit commits, which helps in cleaning up commit history. Specific commands like "pick", "drop", "squash", and "fix up" offer fine control over the final structure of the commit history.
- Handling Mistakes During Rebase: Magliola details strategies to recover from common pitfalls in the rebasing process, such as using the git reflog to navigate back to previous commit states or abort the rebase if conflicts arise.
- Commit Messages Matter: The significance of writing clear and informative commit messages is stressed, as they provide context for reviewers and future developers alike.

Magliola concludes with the message that mastering Git and its rebase functionality not only helps in getting PRs accepted but also maintains a cleaner codebase for future contributors. He encourages developers to invest time in mastering these techniques, as they lead to more maintainable code and foster better collaboration in open-source communities.

00:00:12.540 I would like to talk about my first real introduction to open source, specifically how I navigated a challenging pull request (PR). There was a small gem we needed to use, but to make it work, we essentially had to rewrite all of it. What we ended up with was a large PR, which is never ideal. However, in this case, it was necessary because all parts had to be changed together.
00:00:25.800 You've probably had to review PRs like this, and I can imagine your inner turmoil just seeing such a large number of commits. However, if you were to review this PR, you'd find it easy to follow because it was constructed from many commits. Each commit was designed to accomplish one specific task and included an explanation in the commit messages detailing what we did and why. By reading these messages one by one, you would be able to piece together the story and understand the changes easily.
00:00:38.399 Even though this was a very substantial change, the way it was organized made it reviewable. This organization was crucial in how we managed to submit it. If we hadn’t done this, we probably would have had to fork the gem since it would be too challenging to review a change of this magnitude in a chaotic manner. I don't recommend writing PRs this large, as it can be an extreme case. However, even smaller PRs may require several changes at once. I strongly suggest cleaning up your commits to create a narrative that is easy to follow step-by-step.
00:01:13.439 The alternative is to leave the commits as they happened, which generally looks disorganized and is much harder to review. When this occurs, the reviewer is forced to read through all files in alphabetical order, which does not make sense.
00:01:19.799 Imagine trying to explain your changes to a colleague. You wouldn't go through the files alphabetically; instead, you'd tell the narrative of the changes, starting with the basics and building a complete picture as you go.
00:01:27.780 When you submit to an open-source project, you don’t get the chance to sit down with the reviewer, so you must do all of that explaining through your commits, starting from the very basic changes and explaining why you made those changes as you develop the fuller story in subsequent commits. Bad commits also pose a significant challenge for anyone in the future trying to understand the reasoning behind the code.
00:02:01.020 For instance, I found this commit the other day that left me scratching my head—was that intentional, or is it a bug? I immediately looked at the commit message to see if it would clarify what led to that point. Unfortunately, what I found was not helpful.
00:02:24.540 What you want is what Eileen showed us yesterday: a code snippet followed by a clear explanation of what happened and why the code is structured that way. Ideally, the commit should also give you a heads up, indicating that more changes may be necessary in the future.
00:02:42.180 As developers, we invest a tremendous amount of time and energy into ensuring that our code is maintainable. However, we must also remember that the understanding of how we got to a certain point in the code's history is equally significant for maintaining that system. Therefore, when composing your commit history, aim to tell a story. Not only will this make your reviewers' lives easier, but it will also increase the likelihood of your changes being accepted and leave a cleaner trail for future reference.
00:03:11.760 Of course, I didn’t just churn out these perfect commits magically on my first attempt. This PR was months of work involving many changes, tweaks, and experiments. I undoubtedly went through hundreds of subpar commits, but you wouldn't see any of that in the final result. The end product is a pristine story devoid of all the messy rewrites and failed experiments.
00:03:54.060 To achieve that polished narrative, you must edit your commits. You don't want to showcase the complicated and winding path you took to arrive at your desired result. Instead, your commits should narrate a straightforward story that's coherent and easy to follow. For that, you will need to use interactive rebase.
00:04:26.699 Now, rebase is a rather peculiar tool and often carries a bad reputation. Many people have tried using rebase once, encountered weird results or lost some code, and decided to avoid it altogether, opting instead for merging permanently. This distrust is understandable, as the user experience can be quite opaque, with cryptic and frightening error messages. It can feel like one tiny mistake can obliterate hours of work.
00:05:03.720 Rebase is challenging to grasp at first, especially without some background information. When starting out with Git, it’s crucial to understand its underlying mechanics because having this knowledge will significantly ease your journey. Today, we're going to explore how Git branches function and how to use rebase to refine your commit history, ensuring your contributions are cleaner and more manageable.
00:05:43.740 There are three important concepts to remember. Firstly, as you develop your code, you create a series of commits for your changes. Each commit allows you to see the state of your code at that moment in time. However, it's important to note that commits don't store the entire state of your code. Instead, each commit relies on its parent— the previous commit—which means that each commit only represents the changes made to files relative to its parent.
00:06:06.180 So, if you were to check out a specific commit, you'd actually retrieve the code by applying the changes, or diffs, from that commit to its parent. The interesting thing about this is that since each commit is just a diff, you can move them around independently. For example, you can have separate branches with different lines of development happening simultaneously.
00:06:43.500 In Git, you've probably used the `git checkout` command to jump to any commit and see the code at that point. Each commit has a unique SHA that serves as a handle for referencing it. Instead of having to type out long SHAs every time, Git allows you to use references, which are essentially labels attached to commits. There are two types of references: tags and heads, and while they function similarly, they serve different purposes.
00:07:19.080 Tags are used as historical markers, often utilized to track which commit corresponds to a given version of your code. Meanwhile, heads indicate the current branch you are working on, moving forward as new commits are made. Essentially, branches serve as pointers to commits and are designed to move forward as you add new changes.
00:07:51.540 Now, let's discuss merges briefly. In Git, you generally have a linear set of commits, where each commit has one parent and one child. However, special cases exist. A commit can have multiple child commits, allowing for branching. This is how you can have your main branch alongside separate feature branches. In contrast, some commits might have two parents when merging branches together.
00:08:41.220 When you merge a branch, Git determines where the branches diverged by taking all the changes from the commits in that branch since the split and collapsing them into a single diff that's applied to the branch you're merging into. It produces a merge commit that has two parents, representing the histories of both branches.
00:09:18.300 Rebasing functions similarly, but instead of merging, it re-applies commits one by one. When you rebase onto another branch, Git will still identify where the branches diverged but won't collapse the commits. Instead, it will apply them sequentially to the target branch you're rebasing onto. This preserves the commit history while incorporating the latest changes from the target branch.
00:10:11.280 Now, it's key to remember that rebasing results in orphaned commits. These are the commits that were part of your original branch that don’t have a reference after the rebase. Though you can't see them in a normal command line or graphical client, these commits still exist and can be accessed.
00:10:55.620 Next, let's address interactive rebase—how to edit your commits and create a refined version of your commit history. Interactive rebase allows you to apply your commits one by one while giving you control over this process. With interactive rebase, you can change the order of commits, add new commits in the middle of the history, split commits that do too much into separate ones, or squash related commits into one.
00:11:38.880 For instance, imagine we have multiple commits in a branch that we are going to rebase against the main branch. Collectively, they represent a feature we’re developing. However, we might also have some experimental commits that we no longer require or commits with subpar messages that need rewording. In a better open source submission process, unrelated changes would typically require their own PR, but in your code, you essentially have the freedom to leave them in.
00:12:23.520 With interactive rebase, we can clean up all these changes. We would drop unnecessary experimental commits, revise commit messages to be more descriptive, move unrelated changes to the top, and combine related commits into one. Each of these steps corresponds to specific commands within interactive rebase.
00:13:17.579 So, when we initiate an interactive rebase using the command `git rebase main --interactive`, we'll be presented with a list of all the commits. Within this list, we can instruct Git on how to process each commit—defaulting to picking each one to apply. If we want to drop a commit, we specify that. If we identify a commit with a poor message, we can choose to reword it. For unrelated changes, we can cut and paste, adjusting their order.
00:14:00.540 Each time we modify the commands for those commits, we indicate those changes and exit our editor. Git will then apply the commits in the sequence we specified. For squashed commits, the prompt will let us edit the commit message before moving forward. The end result is an organized and coherent commit history.
00:14:50.660 Now, I would like to point out how Git presents commits in the context of the interactive rebase. Unlike what we usually see—the newest commits at the top—in interactive mode, the earliest commits are listed at the top, with the latest ones at the bottom. This ordering mirrors how GitHub displays them, making it intuitive for editing.
00:15:37.620 When we complete a rebase, we can see that we started with several commits and reshaped them within our process. The commit order is now as we specified. For example, we might have reordered commits, squashed some together, and opted to drop any that were unnecessary, ultimately leading to a refined series of commits that communicate our changes clearly.
00:16:24.060 This capability for editing, reordering, and squashing is incredibly powerful. By mastering these techniques, you’ll be able to create concise and informative commit histories that effectively communicate your development progress. Remember to be ruthless; every remaining commit should stand on its own, tell a story, and have insightful messages.
00:17:09.000 Let’s also discuss the importance of rewording. Clear and informative commit messages are crucial for your reviewers and anyone looking at your code down the road. You should always review your commit messages to ensure they accurately describe the changes made. If a message isn't satisfactory, reword it.
00:18:05.700 Another command in interactive rebasing is dropping: it allows you to eliminate a commit completely, which is useful if you inadvertently included temporary code or additions that don't belong there.
00:18:45.600 The edit command is a powerful option, albeit poorly named, which can make it confusing. It allows you to apply a commit, pause the process, and make changes before continuing. After you create the edits to commit, Git will simply proceed with applying the other queued commits, incorporating your changes seamlessly.
00:19:39.420 For instance, if you select to edit the second of four commits, Git will apply the first two, halt, and return to the command line. At this point, you could create additional files or make changes, then continue the rebase to add those changes into your history.
00:20:32.040 Combining splitting, reordering, and squashing your commits during a rebase provides ultimate control over your commit history. This combination can let you move code from one commit to another effectively across multiple rebases. Mastering these techniques can significantly enhance the quality of your pull requests.
00:21:36.420 As we wrap up discussing interactive rebase, it's essential to remember that while this tool is simple and subtle, it requires practice to gain proficiency. With time, you'll be able to keep your commit history clean and tidy.
00:22:15.420 If you attempt rebasing, you may encounter issues occasionally, but that's perfectly normal. We'll go over how to recover from potential problems in your Git workflow.
00:22:35.000 The first method of recovery is quite straightforward: use `git rebase --abort` when you’re in the middle of a rebase. This command will return you to the state before you initiated the rebase, no matter what modifications you've made. I frequently use this when I'm surprised by a conflict.
00:23:27.900 If a rebase completed successfully but you later want to revert it, a simple method is to push your branch to GitHub before performing the rebase. If everything looks good after the rebase, you can push it again. If not, a reset to the origin branch reinstates your previous state.
00:24:16.920 Occasionally, you may push your branch to GitHub only to realize there has been a disaster post-rebase—this is when to leverage the reflog. The reflog records all branches and movements, serving as a powerful history of changes. You can identify where your head was before the rebase and reset to that point, restoring your previous commit state.
00:24:38.520 In summary, be sure to remember: Each commit is applied one by one when you're rebasing. If you encounter a conflict, assess the current changes and consider how that commit's diff integrates into the code. Being aware of these elements will help you troubleshoot any issues effectively.
00:25:33.240 It's also important to note that being adept at Git is one of a developer's most valuable skills. Pro Git is a great resource, revealing the inner workings of Git, and can significantly enhance your understanding of this powerful tool.
00:26:14.320 In closing, I hope this discussion on making your PRs cleaner and more manageable inspires you. You now have the skills to handle your commits like a pro and will leave a positive impression on reviewers, including open source maintainers.
00:27:06.420 Before I wrap up, I’d like to say that we're hiring at Indeed Flex, and we’re expanding rapidly across Europe. If you’re interested, feel free to check out the link provided. Also, if you'd like some stickers, I have those with me. Thank you for attending, and I look forward to seeing you at the hallway track!