Data Corruption: Stop the Evil Tribbles

In the video "Data Corruption: Stop the Evil Tribbles," speaker Betsy Haibel addresses the challenges of data integrity in software development, especially within complex systems like Rails applications. The talk emphasizes that data corruption is often inevitable and outlines strategies for recovery and prevention at various system levels. Key points discussed include:

Understanding Bad Data: Haibel challenges the perception of bad data as an external adversarial force. Instead, she suggests that issues often arise from product changes, team miscommunication, or architectural flaws within the codebase.
Promoting Communication: Effective communication among teams is crucial. The author recounts a case from her work experience where miscommunication led to erroneous assumptions about data integrity, culminating in a destructive migration that created production errors.
Building Resilient Systems: Systems should be designed to handle common data integrity issues proactively. This involves employing well-established data integrity patterns, like validations and database transactions. Haibel notes that skipping these best practices often stems from the system's architecture discouraging their use.
Recovery over Prevention: The focus should shift to recovery mechanisms that allow for the identification and correction of data issues. Tools like regular audits and event tracking (inspired by DevOps practices) can help manage data integrity in living systems, contributing to a more sustainable workflow.
Human-Centric Solutions: Haibel stresses the importance of involving people in the data integrity process rather than solely relying on code. Often, users can provide solutions that are more efficient than technical fixes when data integrity problems arise.
Cognitive Load: The complexity of modern systems increases cognitive load, making it easy for developers to overlook data integrity measures. To counter this, creating a culture of paired programming and code reviews can help maintain oversight and improve system reliability.
Adaptability to Computer Errors: The importance of adaptability in the face of bizarre computational errors is highlighted. Developers should be equipped to handle unexpected failures and fix issues promptly to minimize the impact on users.

The concluding takeaway stresses that while data integrity is a challenging aspect of software development, a collaborative culture, effective communication, and appropriate recovery systems can significantly alleviate the inherent complexities. Investing effort into ensuring that systems accommodate both technical and human aspects will pave the way for healthier software ecosystems.

Betsy Haibel encourages teams to acknowledge the messy nature of software development and prioritize strategies that allow for resilience in their systems.

Data Corruption: Stop the Evil Tribbles
Betsy Haibel • April 25, 2017 • Phoenix, AZ

Data Corruption: Stop the Evil Tribbles by Betsy Haibel

Ever had a production bug that you fixed by changing production data? Ever felt like a bad developer for it? Don't. Bad data has countless causes: Weird user input. Race conditions under load. Heck, even changing business needs. We can't fully prevent data corruption, so what matters is how we recover. In this talk, you'll learn how to prevent and fix bad data at every level of your system. You'll learn UX techniques for incremental, mistake-reducing input. You'll learn how to future-proof validations. And you'll learn auditing techniques to catch bad data -- before your users do.

RailsConf 2017