Too Big to Fail

The video "Too Big to Fail" presented by Chris Maddox at RailsConf 2014 discusses strategies for managing failure in software processes, specifically within a payroll system at ZenPayroll. Maddox emphasizes the importance of handling failure, predicting issues, and recovering from failures in a complex, high-stakes environment where user data is critical and where financial transactions are substantial. The presentation focuses on the following key areas:

Understanding Failure: Maddox highlights the inevitability of failure in software systems and stresses that instead of trying to become fault-tolerant, teams should focus on how they respond to failures when they happen.
Predicting and Avoiding Failure: The talk discusses how to proactively prevent errors through mechanisms like database validations. Running validations for every model in the database every few hours helped identify potential issues before they escalated into user-facing problems.
Embracing Failure: Accepting that failures occur and using them as learning opportunities is a key theme. Maddox argues that mistakes in development can lead to improvements and growth in understanding how systems operate.
Recovery Mechanisms: The presentation describes the development of a library named "Ultramarathon" to streamline running long processes while accounting for potential failures and enabling easier recovery. The approach includes breaking down tasks and managing state effectively, thereby preventing single points of failure during complex operations.
Philosophical Takeaway: Maddox concludes that adopting a mindset that tolerates failure can enhance team morale and system robustness. The approach involves prioritizing user experience, safeguarding sensitive user information, and maintaining operational integrity even when errors occur.

Overall, the presentation serves as a call to action for software teams to rethink their strategies about failure, illustrating that rather than fearing errors, recognizing and learning from them is essential for sustainable growth in software development.

Too Big to Fail
Chris Maddox • April 22, 2014 • Chicago, IL

It's 5am and a multi-million dollar process fails halfway through. Hours of nightmarish, manual brain surgery later, enough is enough.

What happens when background jobs grow as bloated as the MonoRail™ that begot them?

Rather than reach for the latest fad off of HackerNews, we'll user Ruby and Rails to automate error-recovery, concurrent processing, and catch corrupt data before it brings everything down.

Typist, Philosopher at ZenPayroll. Humanist with a penchant for dystopian novels, St. George gin enthusiast, and wearer of colorful pants.

Help us caption & translate this video!

http://amara.org/v/FG1u/

RailsConf 2014