The Overnight Failure

The talk "The Overnight Failure" by Sebastian Sogamoso at wroc_love.rb 2017 focuses on the theme of embracing failure in the software development industry, highlighting the importance of discussing and learning from mistakes. Sogamoso, who works for Cookpad, shares a personal story about a major incident at a previous job that he refers to as Black Saturday, illustrating the challenges and lessons learned from this experience.

Key Points Discussed:
- Introduction and Context: Sogamoso introduces himself and his background, sharing his love for Poland and inviting attendees to an upcoming Ruby conference in Colombia.
- Cultural Observations: He humorously mentions his observations about Poland, including its unique drunk tests and the confusion around building floor designations.
- Story of "The Overnight Failure":

- The presentation dives into a critical failure associated with a carpooling app that Sogamoso was developing.

- On a routine billing day, multiple users were accidentally charged due to a system error that created duplicate payment jobs, leading to outrage among users when their cards were declined due to excessive charges.
- Crisis Management: Sogamoso recounts waking up to a flood of complaints and the frantic efforts to contain the financial damage, which involved stopping the charge processing and reversing the erroneous charges.
- Root Causes Identified:

- The errors were attributed to flaws in both the job retrieval system and the payment processing logic, which were not sufficiently verified for duplicates.
- Learning from Failure: Despite the devastation caused by the incident, Sogamoso emphasizes the significance of openly discussing failures within the tech community to mitigate feelings of imposter syndrome and foster a culture where mistakes can be addressed constructively.
- Post-Mortem Analysis: He outlines steps taken after the incident, including implementing better testing practices and monitoring systems to avoid similar failures in the future.

Conclusions and Takeaways:
- Cultural Shift: Emphasis on creating an environment where discussing failures is normalized rather than stigmatized, enabling teams to learn collaboratively.

- Importance of Testing: Highlighting the necessity of robust testing and monitoring processes to catch issues before they escalate.
- Community Engagement: Encouragement for attendees to share their own failures as part of a collective learning experience using the hashtag #IBrokeThings and stressing that individuals should not equate failures with their identity.

- Call to Reflection: Sogamoso urges developers to reflect on their own worst-case work scenarios and learn to manage stress and repercussions when failures occur.