Keynote: Code Quality Lessons Learned

Keynote: Code Quality Lessons Learned Bryan Helmkamp

We started Code Climate with a simple hypothesis: static analysis can help developers ship better code, faster. Five years later, we analyze over 70,000 repositories each day spanning a wide variety of programming languages, and along the way we've learned a lot about code quality itself: what it means, why you want it, how you get it,and more. This talk will cover some of the more surprising insights, including what makes a code metric valuable, when unmaintainable code may be preferable, and the number one thing that prevents most developers from maintaining quality code over time.

Help us caption & translate this video!

http://amara.org/v/PsL5/

GoRuCo 2016

00:00:16.450 Thank you very much! I'm really excited to be here. This is my 10th GoRuCo. I've been at all of them, so I'm especially honored to be speaking at this year's edition. I'm also really honored to be keynoting; that was a bit of a surprise.

00:00:21.940 Until I found out that this actually meant I would be the first one up in the morning, so there were trade-offs with that, but I made it. We're going to talk a little bit about code quality lessons learned. As Luke mentioned, I started a company called Code Climate about five years ago.

00:00:34.660 How many of you have heard of Code Climate? Alright, anyone who doesn't have their hands up, come see me afterwards. The reason I mention this is because Code Climate helps teams by providing automated code review using static analysis to help them achieve higher quality and better outcomes.

00:00:55.870 We have a little bit of experience working with this. We analyze over 80,000 repositories daily from more than 75,000 developers who utilize our tools across 1,500 organizations. This is where a lot of the experience we'll discuss today is drawn from.

00:01:19.300 We thought it would be interesting to explore six questions related to code quality that might have surprising or non-obvious answers. The first question is: What is code quality? This question arises almost immediately when we start talking with an organization about their goals.

00:01:56.369 The 'code' part is clear; we all know what code is. But what does 'quality' mean? It might not be as straightforward to answer. One way to think about what code quality means is to consider its opposite: legacy code. Many of you are probably familiar with the concept of legacy code, often described as the opposite of code quality. It’s defined as code written by someone else or even code written by me more than two weeks ago.

00:02:47.790 Consequently, anything that isn’t legacy can be considered quality. However, we recognize that code quality encompasses many meanings. When we converse with developers about the quality of their code, they use adjectives such as 'simple.' They want to complete their tasks as efficiently as possible, ensuring the code is well-tested.

00:03:07.040 They also want it to be bug-free, meaning the code should perform its intended function without defects. Additionally, clarity is a key factor: Can someone read and understand it? And can the team maintain it over time?

00:03:24.560 Some teams will discuss whether the code has been refactored — was it merely written once and committed, or did they iterate and improve the design according to best practices and team conventions? Teams may want code to be documented, both internally and through supplementary documentation. Extensibility also matters: Can the code evolve to meet future requirements? Finally, performance is critical; it would be difficult to claim your code is high quality if it takes, for example, 300 milliseconds to process a trade when it should be under a millisecond.

00:04:03.040 So the takeaway here is to ensure that your team is aligned on your goals. If you wish to improve code quality and different individuals on your team have varying understandings of what it is or how to achieve it, you risk expending time, energy, and possibly frustration moving in different directions.

00:04:39.880 A quote I feel sums this up well highlights the design of code: someone once said that any code less decomposed than theirs is a mess, while any code more decomposed than theirs is over-engineered. You've probably encountered this feeling; the less decomposed code tends to become a mess, while over-engineered code also garners criticism.

00:05:12.730 Thus, the takeaway is to initiate that conversation with your team. The second question I wanted to tackle is: What is the best way to measure code complexity? Suppose we've established that maintainability and complexity are important goals, how can we measure these aspects with our codebase?

00:05:30.360 The oldest and most recognized measurement of code complexity is 'WTFs per minute.' You can visualize a code review on the left side where reviewers are swearing just a little. In contrast, the right side depicts a review going far less well; thus, the code's quality is clearly lower. In truth, many metrics exist for evaluating the complexity of a given unit of code.

00:06:22.670 Some of you may know about cyclomatic complexity, also known as McCabe's complexity, which refers to the number of individual paths through a block of code. There’s also the ABC metric, which is a slightly less trusted measure of complexity based on the assignments, branches, and conditionals within a method or class.

00:06:55.610 Finally, the classic measure of code complexity that everyone recognizes is the number of lines of code.

00:07:06.020 To compute the ABC metric, you square the number of assignments, branches, and conditionals in the code, sum these values, and then take the square root to produce a single number. This metric not only accounts for code paths and conditional logic like cyclomatic complexity but also includes elements like assignments that add to the cognitive load.

00:07:43.570 However, if you tell a developer that the method they wrote has an ABC score of 47.3, it may be difficult for them to interpret. They might wonder whether this is a low or high score and what they could do to reduce it. They might infer that moving some code out of the method would lower one of the variables and thus the overall score, which is true but somewhat indirect.

00:08:55.340 At Code Climate, we support both cyclomatic complexity and ABC metrics in many cases. Through discussions with various teams, we found that the most crucial aspect of selecting a metric to track is choosing one that resonates with your team. If your team is accustomed to discussing cyclomatic or McCabe complexity, that could be suitable.

00:09:39.990 However, if those concepts are unfamiliar, it may be more prudent to opt for something simpler. When a team approaches us expressing the desire to track a metric related to complexity or maintainability, we often recommend simply using lines of code. This recommendation may sound naive, but there's a reason behind it.

00:10:07.750 Historically, developers' productivity was evaluated by the number of lines of code produced, where more lines equated to better developers. But if we assess lines of code for a class or method, it correlates remarkably well with how comprehensible that unit of code is.

00:11:05.600 For instance, consider a 'user.rb' file that is 1,300 lines long. If it’s growing weekly with new features, chances are that code is challenging to maintain. By using lines of code as a metric, the improvement action is clear: that code needs to be refactored or split up.

00:11:52.930 Utilizing lines of code can also simplify conversations around metrics. Many teams waste significant time in internal debates about the relative value and precision of metrics like cyclomatic complexity and ABC metrics, resulting in diminishing returns.

00:12:22.790 This leads us to the third question: Why are older projects harder to maintain? Everyone enjoys working on greenfield projects; the name evokes images of blue skies and sunny fields. However, over time, these fields often attract 'cows'—sloppy code. As developers introduce sloppy code, it becomes self-reinforcing.

00:12:51.570 Pressure often causes the introduction of sloppy code as there might be critical business priorities or deadlines looming. This pressure can lead to cutting corners, which results in more sloppy code. Although developers may have good intentions, this can hinder potential future work on the codebase.

00:13:41.030 Ultimately, the project might become late as new sloppiness accumulates. When you inform business stakeholders that a project will be delayed, the cycle starts over, creating further pressure.

00:14:09.110 Another reason older projects face these challenges is that code quality is a moving target. The code is static, while the business evolves and requires new features, resulting in technical debt. The greater the technical debt, the harder the project becomes to maintain.

00:15:04.070 Organizations might try to tackle this with rewrites or redesigns. We often recommend a more iterative design approach. Regularly aligning code with the evolving understanding of the business can keep technical debt in check.

00:15:40.440 This principle aligns with the Boy Scout rule: always leave your code better than you found it. It's about progressively improving the codebase.

00:16:05.890 The fourth question to address is what is the optimal size for a pull request? Pull requests can range from a single line change to large contributions containing thousands of lines. Generally, the larger the pull requests, the harder they are to review.

00:16:49.089 A study conducted by SmartBear Software analyzed about 2,500 code reviews and examined the size of changes relative to defect density. Surprisingly, larger pull requests tend to have fewer issues reported, not because of inherent quality, but due to review fatigue.

00:17:40.150 Statistically, 400 lines of code emerged as a sweet spot for reviewers to efficiently identify defects. Anything beyond that compromises their ability to find issues. Consider prompting your team to break down requests exceeding 400 lines when possible.

00:18:09.940 Interestingly, Brandon Keepers from GitHub observed that we seem more receptive to feedback from automated systems than humans. Thus, integrating static analysis tools or linters can help in situations where human reviewers struggle.

00:18:48.300 The fifth question we wanted to address is when might sloppy code not be a problem? Sometimes low-quality code can be acceptable, particularly in three scenarios. First, when proving a hypothesis; if the business is uncertain about the value of the project, it might not be practical to prioritize quality.

00:19:25.130 As my friend Patrick McKenzie once tweeted, there are two kinds of bootstrap startups: those with integration tests and those with revenues. While amusingly overstated, it captures the essence of prioritizing delivering value.

00:20:36.860 Secondly, if you are building something that might be discarded, sometimes you need to create a version of the code to learn and understand better before ultimately discarding it—a ‘spike.’ Be cautious, though; spikes should genuinely be discarded.

00:21:33.640 Lastly, there's the 'omega mess'—code with only inbound dependencies that does not change. If a unit of code interacts with a legacy external system that hasn’t evolved, its lower quality might be acceptable as long as the module exposes a clean interface.

00:22:27.390 The final question explores what poses the greatest enemy to high-quality code. The concern isn't necessarily apathy toward producing quality work; I think most developers genuinely want to create excellent code.

00:23:15.290 It's also not about ability. Even the most seasoned teams fall victim to the pressures and cycles that cause code quality to erode over time.

00:24:05.670 Changing requirements contribute, as business stakeholders often shift priorities, but the main issue is fear. The fear of introducing errors often leads developers to make decisions that result in less maintainable code.

00:24:51.650 Each step toward reducing the likelihood of introducing bugs may actually create more complexity. This fear ultimately results in code that is harder to understand and manage.

00:25:55.620 To improve quality, the most effective recommendation is reducing fear surrounding code changes. Automated testing, operational metrics, code reviews, and static analysis can all contribute to alleviating this fear.

00:26:36.220 Pair programming—having two developers engaged in the process—can also help enhance confidence. Ultimately, if you’re facing quality issues, recognize that hope isn’t a viable plan.

00:27:25.090 Reflect on what code quality means for your team. Define it, agree on it, and then take actionable steps to mitigate the fear associated with changes. This approach allows you to achieve the results you aspire to.

00:28:11.960 Thank you very much! I really appreciate your time!