Talks
Speakers
Events
Topics
Sign in
Home
Talks
Speakers
Events
Topics
Leaderboard
Use
Analytics
Sign in
Suggest modification to this talk
Title
Description
RubyConf AU 2013: http://www.rubyconf.org.au Braintree is a payment gateway, so downtime directly costs both us and our merchants money. Therefore, high availability is extremely important at Braintree. This talk will cover how we do HA at Braintree on our Ruby on Rails application. Specific topics will include: Working around planned downtime and deploys: - How we pause traffic for short periods of time without failing requests - How we fit our maintenance into these short pauses - How we do rolling deploys and schema changes without downtime Working around unplanned failures: - How we load balance across redundant services - How the app is structured to retry requests
Date
Summarized using AI?
If this talk's summary was generated by AI, please check this box. A "Summarized using AI" badge will be displayed in the summary tab to indicate that the summary was generated using AI.
Show "Summarized using AI" badge on summary page
Summary
Markdown supported
The video titled "Uptime == Money: High Availability at Braintree" features a talk by Paul Gross, a developer at Braintree, presented at RubyConf AU 2013. Braintree, a payment gateway that processes online payments, emphasizes the critical importance of high availability (HA) due to the substantial revenue losses incurred by both the company and its merchants during downtime. Gross elaborates on Braintree's strategies for maintaining uptime, addressing both planned and unplanned downtimes. Key points discussed throughout the talk include: - **Importance of High Availability**: With Braintree processing approximately $5 billion in annual transactions, uptime is vital; even a few minutes of downtime can lead to significant financial losses for both Braintree and its merchants. - **Planned Downtime Management**: - The transition from MySQL to PostgreSQL has enabled quicker database migrations, drastically reducing planned downtime. - Implementing rolling updates allows for minimal disruption during deployments, with servers being updated individually without taking down the entire site. - They use transactional DDL in Rails migrations to ensure that failed migrations can roll back without causing significant outages. - The innovative mechanism for managing Rails caches is introduced, allowing old columns to be removed without impacting ongoing operations. - **Handling Unplanned Downtime**: - Braintree employs load balancing across redundant services to optimize uptime during server failures. - The company constructs its load balancing system using tools like Linux Virtual Server (IPVS) rather than relying on third-party black box solutions, enhancing understanding and control over their systems. - Automatic failover mechanisms are in place to seamlessly route traffic to operational instances in case of server failures, which helps in managing service continuity. - The use of tools like BGP for managing inbound traffic ensures redundancy by rerouting through alternate paths during network issues. - **Robustness of Architecture**: The architecture incorporates components like a Redis queue (Broxy) for request handling, allowing for the acceptance of requests even while performing maintenance, which mitigates the impact on end users. In conclusion, Braintree's approach to high availability combines meticulous planning, use of modern technologies, and a resilient architecture to meet its uptime goals. By striving for five nines (99.999%) availability, Braintree continuously adapts its strategies to ensure minimal service interruption, thus safeguarding both its and its merchants' revenues.
Suggest modifications
Cancel