Talks
Speakers
Events
Topics
Sign in
Home
Talks
Speakers
Events
Topics
Leaderboard
Use
Analytics
Sign in
Suggest modification to this talk
Title
Description
GORUCO 2018: After Death by Sam Phippen
Date
Summarized using AI?
If this talk's summary was generated by AI, please check this box. A "Summarized using AI" badge will be displayed in the summary tab to indicate that the summary was generated using AI.
Show "Summarized using AI" badge on summary page
Summary
Markdown supported
In his talk titled "After Death" at GoRuCo 2018, Sam Phippen, a member of the Aspect Corps team and interim manager at DigitalOcean, explores the inevitability of failures in software engineering systems and outlines effective practices for incident management and post-mortems. He emphasizes that all systems, no matter how well designed, will face issues, and being prepared for incidents is crucial. Key points discussed include: - **The Nature of Failures**: Phippen discusses the emotional aspects engineers experience during incidents, such as feelings of frustration and sadness. - **Incident Response Covenant**: He introduces a covenant he has with his team to minimize the frequency of incidents and empower team members to prevent them. - **Importance of Post-Mortems**: Phippen outlines the significance of conducting post-incident reviews (called post-mortems at DigitalOcean) to learn from failures. - **Severity Ratings in Incidents**: The organization uses a severity scale from 'Zero' (critical business impact) to 'Four' (bugs and defects to be addressed later), which guides the response and indicates the urgency of issues. - **The Incident Timeline**: He stresses the importance of documenting a timeline of events, focusing on facts rather than emotions, to understand when and how incidents occurred. - **Root Cause Analysis**: Phippen caution against oversimplifying root causes and encourages looking deeper into systemic issues that contribute to incidents, such as inadequate staging environments and outdated practices. - **Continuous Improvement**: He advocates for a culture of learning from incidents and stresses the value of ongoing effort in enhancing incident response processes. Phippen shares specific examples from DigitalOcean's post-mortems to illustrate how they document incidents and analyze root causes. He concludes by highlighting the need for empathy and understanding in incident management, advocating for systems that support engineers during stressful situations while facilitating continuous improvement. His talk serves as a valuable resource for professionals looking to enhance their incident response practices and cultivate a more resilient software engineering culture.
Suggest modifications
Cancel