Talks
Speakers
Events
Topics
Sign in
Home
Talks
Speakers
Events
Topics
Leaderboard
Use
Analytics
Sign in
Suggest modification to this talk
Title
Description
RubyConf 2019 - Elasticsearch 5 or and Bust by Molly Struve Breaking stuff is part of being a developer, but that never makes it any easier when it happens to you. The Elasticsearch outage of 2017 was the biggest outage our company has ever experienced. We drifted between full-blown downtime and degraded service for almost a week. However, it taught us a lot about how we can better prepare and handle upgrades in the future. It also bonded our team together and highlighted the important role teamwork and leadership plays in high-stress situations. The lessons learned are ones that we will not soon forget. In this talk, I will share those lessons and our story in hopes that others can learn from our experiences and be better prepared when they execute their next big upgrade. #rubyconf2019 #confreaks
Date
Summarized using AI?
If this talk's summary was generated by AI, please check this box. A "Summarized using AI" badge will be displayed in the summary tab to indicate that the summary was generated using AI.
Show "Summarized using AI" badge on summary page
Summary
Markdown supported
In her talk 'Elasticsearch 5 or Bust' at RubyConf 2019, Molly Struve recounts the daunting experience her team faced during the Elasticsearch upgrade at Kenna Security in 2017. She emphasizes the importance of preparation and teamwork in navigating software upgrades. The narrative serves as a cautionary tale, illustrating the consequences of assuming smooth transitions based on past experiences. Struve provides a detailed account of their upgrade process, which involved a critical outage that lasted almost a week. During this time, the team encountered severe performance issues, crashes, and the daunting realization that rolling back the upgrade would be challenging. Struve highlights several key lessons learned from the ordeal: - **Have a Rollback Plan**: Preparation is essential; understanding how to revert upgrades can save valuable time and resources should problems arise. - **Perform Thorough Performance Testing**: Assumptions about software stability can lead to dire consequences; comprehensive performance testing is necessary. - **Don’t Ignore Small Warning Signs**: Early indicators of trouble should never be overlooked. Each warning should be investigated thoroughly. In addition to technical lessons, she discusses the importance of leveraging community support, emphasizing that outreach for help can drastically cut down the time spent troubleshooting. The role of leadership in these high-stress scenarios was underscored as crucial, with Struve crediting their VP of Engineering for his unwavering support during the crisis. She reflects on the camaraderie developed within her engineering team, highlighting that strong character and team dynamics matter significantly during crises. Finally, she points to embracing mistakes as a way forward, advocating for companies to learn from outages rather than hide from them. Struve hopes her experiences will help others avoid similar pitfalls, framing the talk as a guide to making future upgrades smoother and more manageable.
Suggest modifications
Cancel