Escalating Complexity: DevOps Learnings From Air France 447

In her presentation titled "Escalating Complexity: DevOps Learnings From Air France 447," Lindsay Holmwood discusses the tragic crash of Air France flight 447, which occurred on June 1, 2009, claiming the lives of all 228 passengers and crew. The talk critiques the mainstream narrative that attributes the crash solely to pilot error, arguing instead that this oversimplifies the events surrounding the incident. Holmwood emphasizes the importance of understanding the complexities of systems and that blaming individuals overlooks the systemic issues that can lead to failures.

Key points covered in the presentation include:

Mainstream Narratives: Hollywood discusses how reports frequently blame pilots for incidents, focusing on human error while neglecting the broader context of the complex systems in which they operate.
Pilot Experience: She refutes the notion that the pilots were inexperienced, detailing their extensive flying hours and qualifications, thus challenging the argument that poor training led to the crash.
Complex Systems: The presentation highlights the significance of viewing pilots as participants in a larger system rather than as isolated actors. Holmwood references the BEA report indicating that different crews under similar circumstances would likely act similarly.
Local vs. Global Rationality: Holmwood differentiates between local rationality, which reflects the pilots' decision-making in real-time, and global rationality, which is the benefit of hindsight that investigators enjoy after the fact.
Systems Feedback: She stresses the importance of clear feedback mechanisms within operational systems, using the Airbus A330's flight control modes as an example. The lack of effective feedback during the flight led to critical misunderstandings by the pilots.
Communication: Holmwood explores the challenges of communication in high-pressure environments, stating that tactile feedback and alert mechanisms need to be enhanced to prevent critical information from being overlooked.
Lessons for DevOps: Drawing parallels between the aviation incident and technology operations, she advocates for comprehensive incident response plans and effective communication channels in tech environments.

Holmwood concludes by reminding the audience that while system failures are inevitable, the approach to understanding these systems can transform how we respond to and mitigate risks. The final takeaway is to avoid an anthropocentric view of systems and to recognize the role of context in failures, emphasizing that a holistic perspective can lead to a better understanding of both aviation safety and operational effectiveness in technology.

This talk encourages the audience to consider how their own systems operate and how to improve upon them to avoid tragic outcomes in their fields.

Escalating Complexity: DevOps Learnings From Air France 447
Lindsay Holmwood • January 28, 2020 • Earth

Title: Escalating complexity: DevOps learnings from Air France 447
Presented by: Lindsay Holmwood

On June 1, 2009 Air France 447 crashed into the Atlantic ocean killing all 228 passengers and crew. The 15 minutes leading up to the impact were a terrifying demonstration of the how thick the fog of war is in complex systems.
Mainstream reports of the incident put the blame on the pilots - a common motif in incident reports that conveniently ignore a simple fact: people were just actors within a complex system, doing their best based on the information at hand.
While the systems you build and operate likely don't control the fate of people's lives, they share many of the same complexity characteristics. Dev and Ops can learn an abundance from how the feedback loops between these aviation systems are designed and how these systems are operated.
In this talk Lindsay will cover what happened on the flight, why the mainstream explanation doesn't add up, how design assumptions can impact people's ability to respond to rapidly developing situations, and how to improve your operational effectiveness when dealing with rapidly developing failure scenarios.

Help us caption & translate this video!

http://amara.org/v/FGbc/

MountainWest RubyConf 2013