Migrating a Live Site Across The Country Without Downtime

In this presentation at the MountainWest RubyConf 2013, Drew Blas from Chargify.com discusses the complexities and strategies involved in migrating their live site across the country without downtime. Chargify is a service that processes payments for businesses, necessitating their commitment to 24/7 uptime without planned maintenance windows.

Blas highlights the critical need for seamless migration due to the impact on customer businesses if Chargify experiences downtime. He outlines their selection of EC2 as their new cloud provider, citing its PCI Level 1 compliance and superior customer support as crucial factors in their decision-making process.

The key points covered include:

- Automation with Chef: Chargify transitioned from manual processes to automated systems using Chef, enabling efficient configuration management and successful replication of their environment on AWS.

- Testing Protocols: Stress testing and process testing were employed to ensure comprehensive validation, focusing on the functionality of systems under load and establishing consistent operational behavior across both data centers.

- Data Synchronization Challenges: Migrating without downtime required overcoming challenges such as data layer synchronization, DNS management, and routing traffic for MySQL and Redis. Strategies included establishing a site-to-site VPN for inter-datacenter communication and using HAProxy for efficient traffic routing.

- Utilizing Tungsten: To manage MySQL replication and ensure data integrity, Tungsten's capabilities were leveraged, facilitating smooth failovers and uninterrupted service during the migration.

- Execution Plan: The migration process was meticulously planned with a six-step procedure that included halting non-essential processes and managing DNS settings to transition smoothly to the new data center.

In conclusion, Blas emphasizes the importance of thorough documentation, rigorous testing, and an automated infrastructure to facilitate future migrations and operational resilience. He encourages relentless testing, which is essential for maintaining high availability and performance in a cloud environment.

This session serves as a comprehensive guide for professionals facing similar challenges in data migration, addressing not only technical execution but also strategic planning and operational continuity.

Migrating a Live Site Across The Country Without Downtime
Drew Blas • January 28, 2020 • Earth

Title: Migrating a live site across the country without downtime
Presented by: Drew Blas

Chargify.com's customers rely on us to process payments for them 24 hours a day. We do not have any planned maintenance windows: we're simply expected to be up all the time. We recently migrated from a private datacenter to EC2, moving all our operations and data across the country with zero downtime. All thanks to a combination of highly-automated configuration with Chef and specialized DB tools like Tungsten.
You'll learn about our pain points in planning the switchover, like:
Synchronizing data
Cross DC communication & VPNs
Redirecting traffic
Redundancy
Migrating Redis/Resque
Automation
And most importantly, how we addressed every one! I'll demonstrate how we rebuilt our entire infrastructure platform from the ground up: New system images with all new cookbooks that were deployed into our existing operation without any interruption. Finally, I'll discuss testing our stack and how we replicate it among various environments and plan for future expansion.

Help us caption & translate this video!

http://amara.org/v/FGbf/

MountainWest RubyConf 2013