Talks
Speakers
Events
Topics
Search
Sign in
Search
Home
Talks
Speakers
Events
Topics
Leaderboard
Use
Analytics
Sign in
search talks for
⏎
Suggest modification to this talk
Title
Description
http://www.rubyconf.org.au Most of us have a “that day I broke the internet” story. Some are amusing and some are disastrous but all of these stories change how we operate going forward. I’ll share the amusing stories behind why I always take a database backup, why feature flags are important, the importance of automation, and how having a team with varied backgrounds can save the day. Along the way I’ll talk about a data center fire, deleting a production database, and accidentally setting up a DDOS attack against our own site. I hope that by learning from my mistakes you won’t have to make them yourself.
Date
Summarized using AI?
If this talk's summary was generated by AI, please check this box. A "Summarized using AI" badge will be displayed in the summary tab to indicate that the summary was generated using AI.
Show "Summarized using AI" badge on summary page
Summary
Markdown supported
In this talk titled "Datacenter Fires and Other 'Minor' Disasters," Aja Hammerly, a Developer Advocate at Google Cloud Platform, shares engaging tales from her professional journey about various disasters that can occur in tech and how to learn from them to improve processes and resilience. Hammerly emphasizes the importance of having a backup strategy, the benefits of automation, and fostering team diversity for better problem-solving. Key points from her presentation include: - **Emphasizing Correct Backup Protocols**: Personal anecdotes illustrate the peril of performing a release without backup, where Hammerly recounts her experience of corrupting a production database late at night. The lesson learned is to always automate backup processes and to have a safety mechanism in place, like a 'big red rollback button.' - **Case Study of a Data Center Fire**: Hammerly talks about a significant incident where a data center fire at a credit card processor led to service disruption. This highlighted the need for systems that can isolate external dependencies to avoid complete outages. - **Electrical Maintenance Incident**: Another incident involved an upgrade that resulted in complete power loss in their section of the facility. The lesson learned here was about the importance of having recovery processes and spreading hardware across different locations to enhance resilience. - **Innovation Under Pressure**: Hammerly shared a story about a demo in Japan thwarted by incompatible phone adapters, leading the team to creatively improvise with soldering irons to ensure the success of the demo. This underscores the importance of not making assumptions and verifying details. - **Client-Side Issues Leading to Downtime**: She also recounts an experience where malformed messages from clients overwhelmed their web socket application, marking it as an example of how assumptions can lead to significant disruptions. Throughout the talk, Hammerly stresses the importance of trust, communication, collective knowledge, ownership, and learning from failures within tech teams. These experiences and the lessons drawn from them are aimed at encouraging Developers to cultivate resilience and adaptability in technology environments. The concluding message highlights the value of diverse teams in enhancing problem-solving capabilities.
Suggest modifications
Cancel