Building for Gracious Failure

In this talk titled "Building for Gracious Failure" presented at RailsConf 2019, James Thompson discusses the inevitability of failure in software development and the importance of addressing it gracefully. Throughout the presentation, Thompson emphasizes that failures are a part of the software lifecycle and that having strategies to handle them can lead to growth and improvement.

Key Points Discussed:

Acceptance of Failure: Thompson starts by acknowledging that all software fails at some point and emphasizes the need for proactive planning to mitigate these failures.
Visibility into Failures: Gaining visibility is crucial. By using tools for logging and metrics, developers can have better insights into issues when they occur. Thompson shares a personal experience where enhanced logging tools (Bug Snag and Signal Effects) significantly improved the team's ability to understand system failures, illustrating this point with a scenario of job failures in their staging environment.
Focus on Value: Developers should prioritize fixing issues that impact customer value rather than every error that arises. This involves understanding which errors are critical and which can be deprioritized based on their impact on user experience.
Returning What You Can: In the case of corrupt data during a migration, Thompson discusses how returning a null value instead of an error can prevent cascading failures and allow the system to remain operational.
Forgiving Systems: A system designed to be forgiving accepts all data it can understand, even if some data is missing or corrupt. This normalizes interactions and reduces total failures across the system.
Trust Carefully: Trust in dependencies, whether external services or internal teams, must be managed meticulously. Thompson warns that over-reliance can lead to significant failures if not properly mitigated.

Significant Examples:

Thompson recalls an incident where an error in data configuration led to a cascading failure affecting service availability and revenue loss. This scenario highlighted the importance of resilience in system design.

Conclusion:

Thompson concludes with the key takeaway that expecting failure is essential for robustness in software development. By implementing visibility, prioritizing valuable fixes, accepting partial data, and being cautious about trust, development teams can better handle failures and mitigate their effects on the user experience.

The session serves as a reminder that failure is not just a possibility but a certainty in software development, and how teams respond to it defines their success.

Building for Gracious Failure
James Thompson • April 03, 2019 • Minneapolis, MN

RailsConf 2019 - Building for Gracious Failure by James Thompson

_______________________________________________________________________________________________

Cloud 66 - Pain Free Rails Deployments
Cloud 66 for Rails acts like your in-house DevOps team to build, deploy and maintain your Rails applications on any cloud or server.

Get $100 Cloud 66 Free Credits with the code: RailsConf-19
($100 Cloud 66 Free Credits, for the new user only, valid till 31st December 2019)

Link to the website: https://cloud66.com/rails?utm_source=-&utm_medium=-&utm_campaign=RailsConf19
Link to sign up: https://app.cloud66.com/users/sign_in?utm_source=-&utm_medium=-&utm_campaign=RailsConf19
_______________________________________________________________________________________________
Everything fails at some level, in some way, some of the time. How we deal with those failures can ruin our day, or help us learn and grow. Together we will explore some tools and techniques for dealing with failure in our software graciously. Together we'll gain insights on how to get visibility into failures, how to assess severity, how to prioritize action, and hear a few stories on some unusual failure scenarios and how they were dealt with.

RailsConf 2019