Engineering Culture

Summarized using AI

Why Good Software Goes Bad

Rein Henrichs • March 31, 2016 • Earth

The video, titled 'Why Good Software Goes Bad', presented by Rein Henrichs at Ruby on Ales 2016, explores the frequent failures in software development and aims to uncover the underlying reasons for these failures. Henrichs emphasizes the pivotal role of process and culture in software quality, defining culture as a set of shared beliefs, behaviors, and values, while quality is seen as the value of software to users. He highlights the significance of understanding failure as the gap between expectations and reality, and introduces various patterns of software processes that organizations may follow. Key points discussed include:

  • Oblivious Pattern: Initial stages where individuals are unaware of their process, often leading to quick solutions without structured approaches.
  • Variable Pattern: Recognizes that poor adherence to processes leads to variability based on individual performer’s capability, often resulting in blame shifting.
  • Routine Pattern: Emphasizes planning and management but struggles with adapting to changing software requirements, often leading to rigid adherence to plans that can fail in crisis situations.
  • Steering Pattern: Moves towards control by utilizing feedback to optimize processes, enabling teams to tackle more complex problems. The manager's understanding of why certain processes are in place plays a crucial role here.

The talk also elaborates on the impact of feedback and adjustment in managing software processes, using GPT navigational systems as an analogy to illustrate the need for route recalibration. Henrichs discusses non-linear systems and their complexities in software development, highlighting issues like change blindness and the need for continuous feedback to manage known risks effectively. He underscores the necessity for a culture that prioritizes understanding people and their interactions within systems to improve software quality. The talk concludes with a call to recognize the cultural influences on processes and ultimately, the software quality, urging organizations to invest more effort in understanding human behavior within their teams.

Why Good Software Goes Bad
Rein Henrichs • March 31, 2016 • Earth

Software fails a lot. We spend a lot of time trying to fix failing software... and sometimes we fail at that too. How can we get better at it? Here are some questions that this talk will try -- and probably fail -- to answer:

Why do we so often do things wrong when we know how to do things right? How can we create a supportive environment that is tolerant of failure? Why do we keep repeating the same failures? What do other companies do that make them better at responding to failure? What does a company's culture have to do with their ability to respond to failure?

How do we observe and reason about failure? Why is it so hard to estimate things that can fail and what can we do to get better at it? Can we measure how efficiently we resolve failures? What sort of failure response strategies could we use and what are their tradeoffs? What can system complexity and human psychology teach us about failure?

From systems thinking to project management to the interaction between culture and process, if you've ever wondered why you keep experiencing the same problems writing and shipping software then this talk is for you.

Help us caption & translate this video!

http://amara.org/v/IPwu/

Ruby on Ales 2016

00:00:13.400 Let's keep it going for John Scheffler. I got it nailed in one! Before we start, I want to say I'm so happy to see all of you wonderful people here. How many of you are new? Wow, this is your first time! That is more than half, which is amazing. Can we have a round of applause for the new people?
00:00:21.570 And I also want to say, weren't all of the talks great this year? So good! I'll try not to screw that up now. And in the spirit of gratuitous pandering, I want to show this thematic picture of John in his construction earring onesie.
00:00:39.180 Now, to the matter at hand. In the course of my career, as it is, I have made a number of observations about software. Based on these observations, it is my conclusion that software fails a lot. Therefore, I have written this talk, which I am now giving to you. The subject is, 'But Why?' Like all good talks do, I will begin by defining some terms.
00:00:56.520 We're going to be talking about how our software is defined by the processes we use, and how the processes we utilize to build our software are governed by our culture. So, I want to start by defining culture because it's a word that has been used and misused a lot. I want to be clear that I don't mean bro fests and beers after work, nor do I mean preferring to hire people who look like you.
00:01:11.880 What I mean by culture is the set of behaviors, beliefs, and values that we share as a group of people. Since this talk is about the quality of software, I also want to address quality itself. Quality is, generally speaking, not an ethical problem. Software is neither good nor bad; it is merely valuable to someone.
00:01:27.890 And since we're talking about failure, I want to define failure as well. I define failure as the difference between what we expect and what we observe. So, failure is a particular kind of change, specifically the kind we don't like.
00:01:45.110 Before I proceed, I'm going to show you a picture that Amy lovingly drew for me. I want you to take a look at it for 30 seconds, trying to remember as many significant details as you can about this picture. You will literally be quizzed on it later, so pay attention to significant details, like how adorable it is!
00:02:02.090 What I want to start talking to you about today is a concept that I call process patterns, which Gerald Weinberg referred to as sub-cultural patterns. These are ideas we share within our culture about how software should be built, what we value when it comes to building software, and how we define our processes.
00:02:20.650 There is no single process followed by all organizations, and that's fine. Different organizations face different trade-offs and challenges. However, we can identify some common patterns and start to discuss them. If you need to increase your capability as an organization, you might have to reach for new patterns.
00:02:35.360 Gerald Weinberg wrote a book titled 'How Software is Built,' where he talks about these software sub-cultural patterns, and I recommend you read it. The first pattern, or rather, the zeroth pattern, we're going to discuss is the oblivious pattern. This is where we don't actually know we're performing a process.
00:02:51.769 How many of you have written some small program, perhaps a shell script on your own machine? You didn't check it into version control, you didn't create a ticket system for it—you just wrote it, used it, and were done. You were following an oblivious process—the null process.
00:03:10.060 It’s not that following an oblivious process is bad; it's good for certain things but not so good for others. If you need a more capable system, you will naturally evolve out of this into a new pattern. To know when this oblivious pattern can be successful, you need a few things.
00:03:25.900 You need to be capable of solving your own problem because that problem isn't too big for you to handle. You also need to know exactly what you want to build—you're the source of the requirements, you're the one building, and you're the one using it.
00:03:43.020 The next pattern, which I can't remember whether it's the first or second, is called the variable pattern. Phil Crosby, a management consultant, wrote about his own version of these sub-cultural patterns, which started with the pattern I'm about to show you. Gerald Weinberg later added the zeroth pattern.
00:04:03.416 The variable pattern is characterized by poor adherence to plans and processes and is dependent on the performance of individuals. It’s the first pattern where people are aware that they're performing a process. This pattern separates the developer from the user, and it’s also the first time blaming appears as a significant software development activity.
00:04:20.340 This pattern is especially common among small, young companies that build software products for microprocessors. To be successful in this pattern, you need to have a good enough relationship with your customer so that the requirements process isn't poor.
00:04:35.840 You also need to be a group of highly competent professionals—because we're dependent on individual performance—to ensure that there can be some expectation of individual performance. Lastly, the problem you're solving shouldn't require any additional effort.
00:04:51.310 There’s a myth among variable organizations when they try to figure out how to be successful, and it’s the Rockstar Programmer myth. They post ads for rock star programmers because they believe hiring one is the only hope they have of shipping a successful product.
00:05:12.910 They think that because software performance depends on their rock stars, software quality is as variable as human performance—spoiler alert: it's quite variable. These organizations try to find rock stars, yet managers often aren't capable of selecting the right candidates from their pool of applicants.
00:05:31.250 The other issue is that if they have processes, they only follow them when it's convenient. During a crisis, these processes are often disregarded entirely. As a result, the best indicator of success for a software project is which programmer happens to be working on it.
00:05:47.150 In this context, programmers receive all the credit but also bear all the blame. The next pattern is what Jerry calls the routine pattern. There’s a lot of emphasis on creating a plan—because if we can just come up with a robust plan, it will solve all our problems.
00:06:03.590 The first step in achieving statistical control is to attain predictability of schedules and costs. For organizations where the variable pattern isn't achieving this anymore, teams that need to coordinate with others start moving into the routine pattern.
00:06:21.500 It’s no longer sufficient to just lock programmers away and wait to see what they do. Routine organizations depend on plans and managers to execute those plans, but they struggle to deal with the actual changing requirements of software.
00:06:36.220 If a routine organization is to succeed, the problem cannot be bigger than what a small team can handle, and they must utilize their routine process to solve the problem. They also need to ensure that developers follow the process and avoid any extraordinary circumstances that could disrupt it.
00:06:52.370 These organizations have their own myth—the Rockstar Manager myth. This myth suggests that if the right manager can be hired, they will ensure developers follow the plan, and everything will work out perfectly.
00:07:06.410 Routine organizations can achieve a repeatable level of control and quality through rigorous management of commitments, costs, schedules, and changes. However, the catch is that they don't always know what they can do.
00:07:15.900 Usually, this is due to a lack of understanding of why they follow certain processes. Following processes blindly, especially in exceptional circumstances, is generally not successful. Routine managers tend to worsen situations during crises rather than improving them.
00:07:31.610 Routine organizations also believe in what anthropologists call named magic. Named magic is the belief that simply saying the name of a certain methodology—be it Agile, behavior-driven development, or lean—endows us with power.
00:07:46.370 We all know that there aren't silver bullets that can magically improve the quality of our software, so why do we keep investing in these named magic methodologies?
00:08:05.340 The next pattern is what Jerry calls the steering pattern. This is where you start to gain control over the quality of the software you're building. You choose the processes you want to use based on the results you’ve obtained from those processes.
00:08:22.080 Steering managers don’t depend on magic; they depend on understanding. While managers from routine organizations often have successful programming backgrounds, they lack the training, talent, or interest in management. Steering managers, conversely, usually possess some management training.
00:08:39.370 I personally moved from an individual contributor programmer position to a management role, which highlights how we cannot simply assume knowing how to write a program translates into managing people.
00:08:50.520 In steering organizations, processes aren't always well defined, but there is a shared understanding of the goals and why those processes were selected. Even if a process doesn't work, they are capable of recovering from it due to this understanding.
00:09:08.580 This is also why steering managers and teams can tackle harder, more challenging, and riskier projects. If you find yourself in a routine organization and struggle to solve harder problems or ship software to demanding customers, leveling up to a steering organization may be what you need.
00:09:27.290 Steering organizations are suited for problems that are too difficult for a simple routine to handle. Steering organization managers must be able to negotiate with their bosses and vendors and maintain control over external factors while rejecting arbitrary schedules and constraints.
00:09:42.720 There are two more patterns Jerry discusses in his book, but I will not cover them. Here's why: there was a Department of Defense study conducted in the 70s that surveyed various organizations to rank them in what was then called a maturity model.
00:09:57.759 I'm not in favor of the term maturity model, as I don't think I was being immature when I wrote that shell script; I thought it was simply the solution to my problem at that time. This study found that eighty-five percent of projects were at the lowest maturity level, which we can call variable, while only fourteen percent were at level two (routine) and only one percent at level three.
00:10:11.640 No projects or organizations even existed at levels four or five, so I won't waste your time discussing unicorns, and we'll move on.
00:10:27.320 How do you conduct a steering organization? What separates it from a routine organization? The answer lies in the concept of cybernetic control, which involves taking in feedback and using it to make informed decisions.
00:10:43.410 As we receive feedback from the system, we continuously make choices and refine our processes. How many of you remember Triple-A? Before there were Google Maps, you would go to Triple-A for travel directions. They would compile a spiral-bound book listing your route with landmarks along the way.
00:10:59.450 However, these directions lacked context; if you took a wrong turn, you wouldn't have the necessary context to recalibrate your route at all. Essentially, you had to backtrack to find your way.
00:11:15.720 Now, let's look at a more modern example of routing: GPS. A GPS provides a complete map that offers both context and understanding of the route, along with automated recalibration if you get off course.
00:11:31.030 This exemplifies a cybernetic system where your GPS recognizes you've veered off track and helps you make necessary course corrections.
00:11:46.680 To visualize this concept, consider a simple model without a controller, focusing on the inputs and outputs of a system. For a system producing software, outputs include not just software but also other effects like increased staff confidence and numerous failure reports.
00:12:01.230 Inputs need to include requirements, resources, and randomness.
00:12:20.530 When I refer to randomness, I'm speaking of the unpredictable elements that can derail your plans—essentially, the entropy in the system.
00:12:37.090 Variable organizations struggle with this model, as they don’t have a concept of a controller integrated into their systems, leaving them unable to exert control. In this case, requirements go in, and software comes out, but there's little understanding of that process.
00:12:54.790 For routine organizations, if we want to accept control over our system, we need to connect it to a controller so that individuals involved can assist in keeping software on track.
00:13:10.360 However, this level of control is still limited, as the controller can only enforce the plan without a comprehensive view of how far off the plan the system actually is.
00:13:25.440 To evolve into a steering organization, we need to take in feedback from the system's state and its outputs, which allows us to adjust requirements and exert the necessary control over our systems.
00:13:43.700 An effective method to mitigate productivity losses due to sickness is to send staff home when they begin showing symptoms. However, this isn't possible unless you’re aware that an individual is sick.
00:14:03.630 Controllers that only follow the routine pattern lack the ability to observe these signs. A more adaptable and effective control model is a feedback model, which enables us to measure and analyze our system's performance.
00:14:21.260 With effective feedback, we can predict how we want the system to behave. Assuming we can actually observe the real state of the system, we can compare desired outcomes with actual results.
00:14:37.290 If we have the means to act on the system and bring it into alignment with our intentions, we can initiate cybernetic control.
00:14:55.520 Cybernetics is a fascinating field encompassing the science of control and communication in animals and machines. These systems can be observed throughout life, from our human bodies, which are also cybernetic systems, to organizations with interconnected processes.
00:15:10.560 One common challenge is that many cybernetic systems exhibit non-linear behavior. A non-linear system is where inputs do not always correlate straightforwardly with outcomes.
00:15:25.370 When inputs vary, the outputs can also differ, which complicates prediction and management of the system. At the beginning of the talk, I presented Amy's picture. Can anyone recall how many cats were in it? Don't shout it out—just silently remember.
00:15:40.430 Now, think about whether you were correct in your answer. Many of you probably noticed that this is not the same picture I initially showed, showcasing the concept of change blindness.
00:15:55.410 What we have, in essence, is akin to a spot-the-difference game—failure detection, in a nutshell. We have expectations on one side, and what we observe on the other. We're trying to identify mismatches.
00:16:11.290 If I asked you to spot the differences now and we graph how many differences you noted over time, the graph would not depict a straight line. You'd start noticing a lot of differences but that rate would taper off.
00:16:27.740 This illustrates a non-linear system. So why is it not linear? Two simple reasons: first, as you find some differences, fewer remain; second, you begin by spotting the easiest differences before tackling the harder ones.
00:16:42.590 When you mark your progress and say, 'I've found 10 differences in two minutes,' and base your estimates solely on that, predicting how long it will take to finish becomes inaccurate.
00:16:59.180 Humans excel in linear systems but struggle in non-linear ones. Compounding that problem is that we often fail to understand the factors causing this nonlinearity.
00:17:15.530 Thinking about it: when you scan for differences, you lack a clear mechanism for prioritization. If you fix bugs, a natural bias might be to address the easy bugs first—potentially leading to an accumulation of risk over time.
00:17:32.470 Should you instead prioritize the riskiest bugs first, the impact is that you could work towards a less risky system over time, ultimately improving project outcomes.
00:17:49.640 Everywhere you look—requirements gathering, feature development, and bug fixing—non-linear systems appear, yet managers persist with ineffective linear models, yielding disappointing results.
00:18:05.840 The situation worsens as these non-linear interactions add even more complexity to the overall system. We term this the composition fallacy: the idea that the composite system is not more complex than the sum of its parts.
00:18:20.260 Reality shows that the interactions between components lead to increased complexity. I should also highlight the decomposition fallacy, which suggests that if all subsystems work, the overall system must also function.
00:18:35.160 Ignoring the interactions between these systems poses significant risks. Therefore, how can we navigate non-linear systems while avoiding pitfalls from disturbances or insufficiently understood elements? A few suggestions:
00:18:51.460 First, introduce governing actions in your feedback models. Prioritize addressing risky work first, for example. The system may remain non-linear, but you can regulate its behavior to prevent it from spiraling out of control.
00:19:07.010 Second, act promptly—small and often. The sooner you implement corrections, the smaller the needed adjustments become. Less risky changes allow for more accurate predictions.
00:19:24.840 The more frequently you implement modifications, the more opportunities you have to catch problems before they escalate. These small corrections fuel the engine of cybernetic control.
00:19:39.340 However, to address issues, those problems must be visible. If you’re unaware of a problem, you won't be able to recognize or rectify it. Controllers require feedback to function effectively.
00:19:56.590 How many of you utilize the GitHub pull request model? Once you have something to share, you put it up, create a pull request, and receive feedback. It's a solid system and facilitates valuable conversations.
00:20:14.680 More crucially, GitHub renders work products visible that might not be otherwise displayed. Managers cannot solve issues with work they cannot observe, which diminishes their effectiveness in managing software production.
00:20:30.300 This leads us to the next point: if you cannot comprehend a problem or system, you can’t fix it. If your model lacks supporting detail for accurate predictions, your decision-making process will be blind and arbitrary.
00:20:49.100 Regarding models: all cultural patterns employ models guiding their thoughts. These models—implicit or explicit—support the need for understanding and effective communication. Explicit models complement verbal communication with pictorial representations that can illustrate complex interactions.
00:21:07.880 Frederick Brooks, who authored the classic book 'The Mythical Man-Month,' noted that more software projects fail due to lack of calendar time than for all other causes combined. However, he doesn’t mean that running out of time is the root cause.
00:21:26.730 What he does is offer a model that people frequently use to explain project failure, which often represents merely a symptom stemming from deeper issues in their process.
00:21:43.930 If we neglect to expose our models of a system, when someone inquires about why a project failed, we won’t comprehend its underlying issues adequately. Consequently, responses may lean towards 'we just didn’t have enough time.'
00:22:01.200 Managers rely on models to guide their decision-making, whether those models are acknowledged or not. By making models explicit and sharing them, we foster collective understanding and invite constructive feedback.
00:22:17.200 Let’s talk about an implicit model managers often hold. Managers tend to believe that if they input 10 units of pressure, 10 units of increased productivity will result. But allow me to present a more nuanced perspective.
00:22:33.530 It's not just about the linear relationship; there's burnout to consider. When individuals burn out, productivity doesn’t plateau; it plummets. Not just productivity but overall happiness as well.
00:22:51.110 Here's another example of an implicit model, particularly prominent in variable organizations. They often operate under the misconception that they can easily identify the best programmers. This belief is foundational to the premise of their success.
00:23:08.490 Yet, there's scant evidence supporting their ability to identify top talent. In conclusion, I want to highlight the importance of these feedback control systems—these cybernetic systems—at all organizational levels.
00:23:26.500 They should inform our approaches to testing, strategic product decisions, and our understanding of systems. With these models, we can begin to think about and observe our systems through the lens of non-linear interactions.
00:23:43.850 Ultimately, if software is governed by process, and process by culture, then culture is governed by people. If you truly want to enhance software quality, we must dedicate more time to understanding people.
00:24:00.000 I have one last thing to share with you: I also have this GIF of Vader and a little girl. No, no, still no.
00:24:12.790 Okay, good talk, everyone!
00:24:16.480 You.
Explore all talks recorded at Ruby on Ales 2016
+5