Goldilocks And The Three Code Reviews

by Vaidehi Joshi

In her talk "Goldilocks And The Three Code Reviews" at RailsConf 2017, Vaidehi Joshi explores the timing and effectiveness of code reviews in software development. Drawing parallels with the fairy tale of Goldilocks, she seeks to find a code review process that is 'just right'—not too long, not too short, but effective. The presentation highlights the evolution and history of code reviews, their importance in fostering quality code, and the various methods that can enhance the review process.

Key Points:

Fairy Tale Retelling: Joshi introduces a modern rendition of Goldilocks as a Rails developer who navigates through different job experiences with varying code review processes.
Importance of Code Reviews: She emphasizes that structured and thoughtful code reviews are crucial for improving code quality and helping developers grow.
Types of Code Reviews: The talk outlines three types of code reviews:
- Inspections: In-depth focus on defect detection, requiring clear roles and checklists.
- Short Reviews: Brief reviews that can significantly reduce error rates in code submissions.
- Walkthroughs: Exchange-oriented meetings designed for learning and questioning between developers.
Research Findings: Joshi presents findings from a survey she conducted, revealing the effectiveness of code reviews and highlighting that the quality of a team’s code review process is essential for developer satisfaction.
Principal Takeaways: Through the analysis of the survey results, Joshi concludes with several recommendations:
- Review everyone equally—emphasizing collaboration and the importance of feedback for all team members.
- Provide concise context in reviews to facilitate better understanding and communication.
- Utilize tools like linters to streamline the process and focus reviews on substantial content rather than syntax.
- Maintain an iterative process for continuously improving the code review system.

In a closing reflection, Joshi encourages teams to engage in conversations about their code review practices to ensure they are effective and valuable. She reinforces that while theory is vital, the practice of code reviews ultimately hinges on how well teams conduct them. Engaging openly in dialogue around these processes can lead to richer collaboration and better outcomes in software development.

00:00:12.960 Hello, my name is Vaidehi Joshi. Just to get a sense of the room, how many of you grew up hearing fairy tales when you were kids? Nice, okay cool, me too. I wanted to make sure of that, because this talk is not going to be fun if you hadn't raised your hands. So, one of my favorite fairy tales growing up was the story of Goldilocks and the Three Bears. How many of you are familiar with that fairy tale?

00:00:31.380 Cool! I see a couple of people who didn't raise their hands, so very quickly, Goldilocks and the Three Bears is a fairy tale where a girl named Goldilocks breaks into a house and it happens to be the house of three bears. The entire story follows her as she rummages through all their stuff. She lies down in their beds and samples their porridge. Papa bear's porridge is too hot, Mama bear's porridge is too cold, but it turns out that Baby bear's porridge is just right. The whole story revolves around her trying to find that 'just right' thing.

00:01:00.480 I did a little research on Goldilocks and the Three Bears, and it turns out the version we're most familiar with was written in 1837. That's like a 150-year-old story! Looking back on it as an adult, I think it’s strange to teach children to break into the house of bears. I don’t know why it wasn’t something cozier like bunny rabbits or otters. Bears aren’t very friendly, but that’s okay. While researching this, it made me wonder what this fairy tale would be like if we did a modern-day retelling of it. So that's what we're going to do today.

00:01:47.490 In this modern rendition, Goldilocks is over the idea of breaking into people's houses—specifically, the houses of bears. Instead, this modern-day Goldilocks is a Rails developer, as you might expect. Goldilocks has just started her first developer job, and it’s incredibly exciting for her. She didn’t used to be a programmer, and she’s still amazed that people pay her to write code. It’s kind of an unfathomable concept for her. In her first job, she loves what she does because she gets to write amazing code, and she believes it’s amazing. She also gets to work with a fantastic group of people, and she’s learning a lot.

00:02:39.610 However, as we all know, things change very quickly in the life of a programmer. After about six months, she’s writing code, but she isn’t sure if she’s getting better at it or building any technical skills. So, she starts talking to some of her programmer friends and learns that a lot of other junior developers are working at companies with formal code review processes. This makes her feel a bit odd, as she doesn’t have that at her company. Her company is very small, and if she wants feedback, she usually has to seek it out from a senior engineer.

00:03:01.930 The more she converses with others, the more she realizes she doesn’t have any written documentation of whether she’s improving or not. Because of the absence of a code review process in her company, she feels this is probably not great. Eventually, she becomes really frustrated by this. So, she decides it might be time to move on. Goldilocks begins her job hunt and finds her second developer job working in both Rails and Angular—though that’s a different story. She’s having an amazing time at this new company, and part of the reason for that is because everyone participates in code reviews. Everyone reviews code and is also reviewed. In fact, during her first week on the job, she reviews the CTO's code! She thinks, "What? Me? Really? That’s really cool! People think my opinion matters and I can actually have input on technical decisions!"

00:04:24.909 This is also a very small startup company, so they have a basic rule of thumb: as long as one person approves the code, it can be merged in. This system works well; code gets merged, and the team is highly productive. However, Goldilocks notices some drawbacks to this approach. Sometimes, the code reviews evolve into long strings of comments, and they find themselves discussing things like hash rockets and formatting choices—topics she feels maybe shouldn’t be in code reviews. To address this, she decides to implement Robocop, a tool to ensure such discussions don’t dominate code reviews. It’s not perfect, but it's a start, and so far, nothing terrible has happened.

00:06:10.389 That is until one day when Goldilocks is working on an amazing feature she’s been dedicated to for two weeks. Finally, she creates this massive pull request, which gets approved by two other developers on the team and merges in. She breathes a sigh of relief, thinking, "Okay, I’m going to go get coffee; I’ll be back." She comes back 20 minutes later to find that there’s a significant problem. When I say a problem, I mean it’s like a fire—because the entire staging environment has gone down. Everyone is asking, "What happened, Goldilocks? What did you do?" She’s puzzled; they looked at the review and approved her pull request!

00:07:14.440 Goldilocks, not to be deterred, decides to fight this fire. She reviews her massive pull request, and it turns out she made a mistake with some bad copy-pasting. She has two files named the same—one is a blank controller file intended for deletion, but it didn’t get deleted. Now, the entire app has come crashing down because she has conflicting files. The issue is, no one caught this during the review process, which leads her to think, "Oh no, we really thought we had this code review thing down as a team. Just kidding—we didn’t! Clearly, we are doing something wrong." This gets Goldilocks thinking: there must be a better way of conducting code reviews.

00:08:01.290 So, Goldilocks, being pretty audacious, does what any developer would do in this situation: she conducts some research. She starts with one of her favorite books, "Code Complete," an ancient text dating back to 1993, written by Steve McConnell. She’s actually reading the second edition, released in 2004, but that’s still acceptable. From Steve McConnell's work, she learns that code reviews are not a new phenomenon; there’s actually a lot of theory behind them. Thus, she decides to delve into this theory to try to understand what good code reviews should entail.

00:08:54.990 While reading his book, she finds a passage stating that one part of managing software engineering is catching problems at the lowest value stage, which is at the moment when the least investment has been made and which costs the least to correct. She thinks to herself, "I buy this; in theory, it makes sense, and I’m on board!" She continues to read and discovers that code reviews are often associated with quality gates—considering only code that improves their project and codebase is allowed through the gates, and everything else is filtered out. But she realizes that her team’s quality gates don’t really function as gates; they seem more like formalities. "How did that bug get through code review?" she wonders.

00:10:34.029 As she continues her reading, she comes across the concept of collective ownership of action. This idea appears to be the cornerstone of why code reviews originated. Studies have shown that this collective ownership has several benefits. The first is that having multiple sets of eyes on a codebase tends to lead to better quality code. The second is that if someone leaves the team, the impact is lessened since more people are familiar with the code. Finally, the cycles of defect correction speed up because it’s not just one person who knows what the bug is; several people have worked with this code.

00:11:18.820 It turns out there are three types of code reviews, which Steve McConnell’s book explains in depth, and these three types are still used in software development practices today. Some of you might already be using them on your teams. The first type is called inspections, which can be traced back to a team at IBM led by Michael Fagan. He began conducting inspections with his team and, upon seeing their effectiveness, shared this approach with the wider community. Inspections focus on defect detection rather than correction, which is crucial; if you correct code while reviewing, you miss scanning for potential defects. Inspections involve a checklist for what to look for and designated roles for everyone involved.

00:12:55.040 What the team at IBM did was ensure there were at least three people present at every code review: a moderator, the author of the code, and a reviewer. Each person understood their role and came prepared with a checklist of what to look for in the code review process. After each review, any insightful data—what worked and what didn’t—would be fed back into the next code review cycle. This indicates they were continuously iterating on their code review process. The implementers of inspections found that 60% of defects could be caught, and one team observed a 20-30% decrease in the number of defects per 1,000 lines of code after adopting inspections.

00:14:24.210 However, not everyone has an hour to spend on reviewing five or ten lines of code, which leads us to the second type of code review: shorter reviews. The main idea here is that every single line of code should be reviewed. One team that implemented short reviews experienced a drop in their error rate from 55% to just 2%. An even more astonishing observation was that another team, which had an 86% correctness rate, noticed it increased to 99.6% after they began incorporating short code reviews regularly.

00:14:57.580 The third type of code review is something called a walkthrough, which typically involves a working meeting lasting 30 to 60 minutes. In a walkthrough, a senior developer and a junior developer discuss what may happen if the program runs and look for potential bugs together. The senior developer guides the discussion, while the junior developer is encouraged to ask questions and challenge any methodologies proposed by the senior developer. Goldilocks realizes that this reflects her experience at her first company, where they had no formalization or plan for their discussions; they simply talked through important points as and when they could.

00:15:44.490 When she examines the peer review sophistication scale, Goldilocks begins to feel disappointed. She realizes that across the software teams she has worked with, she doesn't think they ever really operated beyond a level two on this scale. She wonders, "Am I the only one who feels this way?" While theory can be fascinating, the reality of being a programmer means she must debug her team's code review process. In this modern tale, as a proactive Goldilocks developer, she does what any well-meaning developer would do: she has Twitter, of course, because all good things come through Twitter. She creates a survey and shares it with all her colleagues to see if they share her feelings of disappointment with their code reviews.

00:17:43.740 There’s a little disclaimer: Goldilocks is by no means a data scientist; she probably could have conducted that survey better. One major flaw in her data is that it’s limited to those who self-selected into the survey. She recognizes she could work on improving future iterations, but it's a start. What she discovers is that of the approximately 500 people who responded, a majority worked in Java, Ruby on Rails, and JavaScript. So, what do they think about code reviews? This diverse set of responses is promising, but she wonders about their thoughts.

00:19:11.900 She asked how strongly they agree with the statement, "Code reviews have made me personally a better developer." Everyone seemed to agree, presenting an average score of 8.6 out of 10. Interestingly, she found that Swift developers were the most likely to find code reviews beneficial, while Ruby developers followed closely behind, averaging 9.2 out of 10. Goldilocks also wondered whether she was the only person who had to request code reviews in her previous experiences or if that was common practice. It turns out, most developers who took her survey had their pull requests reviewed. However, about 10% of respondents indicated they always had to ask for peer review, revealing a lack of structured, formalized review processes.

00:20:20.240 This led Goldilocks to conclude that an effective code review process isn't determined by the programming language or framework used, but rather depends on the team dynamics and how they implement their reviews. She also discovered that most developers felt only one person needed to review their code before it could be merged. Additionally, people reported spending an average of 5 to 20 minutes reviewing a single pull request but sometimes waited from an hour up to a week for their code to get reviewed and merged into the codebase. Despite a lot of people being engaged with conducting reviews, many expressed dissatisfaction with how those reviews were executed.

00:21:45.200 One respondent mentioned, "We have one developer who blindly thumbs up every pull request and rarely leaves comments. They are attempting to game the rule of requiring at least two approvals, and it’s easy to tell because they approve two to three pull requests in under a minute." Another individual said, "The second and third reviewers are more likely to rubber-stamp after seeing one person's approvals," leading Goldilocks to wonder whether they were merely going through the motions of reviewing without genuinely engaging. For those who felt satisfied with the review process, she sought to gather insights about what was working and how their experiences could inspire improvements for her team.

00:23:27.350 After reviewing over 500 responses, Goldilocks identified that, almost universally, the success of a code review hinges on two key factors: energy and substance. When we talk about energy, we’re discussing the individual conducting the review and the amount of time they're dedicating to it. She discovered that it isn’t just about completing a code review; it also hinges on who is doing the reviewing and how much weight they carry on the team. One respondent wrote, "Everyone on the team should receive equal review. It’s common for senior people to get feedback, leading people to assume they can do no wrong, but they can, and they may appreciate feedback. In contrast, junior developers often face excessive nitpicking that can harm their self-esteem, which is a very real concern.

00:24:51.490 Goldilocks reflected on this. One person pointed out that code reviews should be recognized as a first-class activity on a development team that requires time and effort. The overarching themes she found were that large pull requests without context made for poor code review experiences, as did unequal reviews. If junior developers only receive reviews and rarely get to review others’ work, they often leave feeling disheartened. Furthermore, developers generally believed that those unhappy with their code review process attributed their dissatisfaction to management or team leaders who failed to appreciate the importance of code reviews.

00:26:40.130 Now, when we discuss substance, we mean what reviewers do while reviewing and how they conduct that process. Even if team members are engaging in code reviews, the quality of the review itself matters significantly. Goldilocks observed that many respondents used the term 'nitpick' in their comments. She investigated this word's usage and found that over 5% of people specifically associated 'nitpicks' with their code review processes. One individual observed, "You should review the code, not the person. Let tools handle styling changes to avoid wasting time on trivial matters." Another mentioned that if they find themselves in a back-and-forth over comments, they’ll just ping the other person to pair.

00:27:46.010 A recurring theme in their comments indicated that many code reviews devolved into prolonged nitpicky discussions. These discussions—focused on minor details—could probably have been resolved through conversation. Developers voiced frustration that decisions regarding style and syntax often became considerable obstacles in code reviews. One respondent lamented, "Most of our code reviews turn into extensive discussions about variable names, shifting the focus from defect detection to correction, rather than maintaining the original purpose of code reviews.

00:29:15.390 The biggest issue, however, seemed to be code reviews that lacked empathy, either from the person doing the reviewing or the one being reviewed. After analyzing all this feedback, Goldilocks realized she wasn’t the only one disappointed by the code review processes; others shared similar concerns. This reflection led her to consider past practices dating back to the 1980s. Back then, code reviews were not just checklists; they also emphasized iterative processes meant to assess and refine the effectiveness of the review system.

00:30:40.520 She understood that the crisis in her team's code review process stemmed from a lack of discussion. They had stopped evaluating whether their review workflow was beneficial or practical. So, while stories, grapes, and fairy tales are delightful, we often need to extract a moral or bring the narrative back to reality. However, the twist in this tale is that I told you this fairy tale was not really a fairy tale, as it closely mirrors my own journey.

00:32:00.400 These days, I no longer work at those two companies. Instead, I’m at a company called Tilde, where we build Skylight, your favorite Rails profiler. In the reality of my current role, I'm unsure if we’ve perfected our code review process. Nonetheless, we continue discussing it, and every engineer on our team feels capable of bringing it up for conversation, contributing to gradual improvements that benefit our teamwork.

00:32:43.890 If I’ve learned anything from this project of understanding code reviews and learning from my peers, it’s these five lessons: First, to err is human, and we are all human, so we should strive to review everyone and ensure everyone is reviewed. Second, helping your teammates is immensely beneficial if you can provide concise feedback within a contextual framework. Some colleagues have begun using screenshots in their pull requests to illustrate beforehand what the behavior was and how it changes afterward, which can help communicate issues effectively.

00:33:32.570 Third is kindness; there’s no better approach than leaving egos aside and remembering that we're collaboratively constructing something larger than ourselves, as a team. Fourth, make life easier; leverage linters and avoid discussions about tabs versus spaces, as they can easily ruffle feathers. Linters help maintain standards, and GitHub features templates to guide you through pull request submissions, making checks easier.

00:34:56.710 Finally, remember the importance of iteration. Keep discussing your code review processes and ensure they remain adaptable, since they are constantly evolving. A great quotation emerged from my survey responses: "I love code reviews in theory, but in practice, they are only as effective as the group responsible for conducting them in the right manner." Ultimately, it doesn’t matter what programming language or framework you’re using—the onus is on you as a team.

00:36:10.220 All of you collectively enhance this exciting creation, so ensure that your quality gates truly reflect quality. I hope that you leave here today, if you haven't started the conversation already, wanting to initiate it with your team. If you find that code reviews are not working for you, don’t hesitate to discuss it and share these insights. If you wish to check out the data I collected, you can visit Better Code Reviews; while I’m not a data scientist, I did manage to collect a lot of data, much more than I could present in this 35-minute talk. If you want to connect further, I encourage you to fill out the survey, and if you are a data scientist, perhaps you can assist me in analyzing the information. You can contribute by discussing it further on your team and initiating the conversation, because while fairy tales might be lovely, this one is just beginning.