Talks
Speakers
Events
Topics
Sign in
Home
Talks
Speakers
Events
Topics
Leaderboard
Use
Analytics
Sign in
Suggest modification to this talk
Title
Description
By Prem Sichanugrist & Ryan Twomey When you're building a payment platform, you want to make sure that your system is always available to accept orders. However, the complexity of the platform introduces the potential for it to go down when any one of the moving parts fails. In this talk, I will show you the approaches that we've taken and the risks that we have to take to ensure that our platform will always be available for our customers. Even if you're not building a payment platform, these approaches can be applied to ensure a high availability for your platform or service as well. Help us caption & translate this video! http://amara.org/v/FGaR/
Date
Summarized using AI?
If this talk's summary was generated by AI, please check this box. A "Summarized using AI" badge will be displayed in the summary tab to indicate that the summary was generated using AI.
Show "Summarized using AI" badge on summary page
Summary
Markdown supported
In the presentation titled **"Zero-downtime payment platforms"**, Prem Sichanugrist and Ryan Twomey address the critical importance of maintaining high availability in payment processing platforms. They emphasize that even minor downtimes can adversely affect customer experience and revenue. Various strategies are discussed to mitigate risks associated with both internal and external downtimes. ### Key Points: - **Definition of Downtime**: The speakers define two types of downtime: - **Internal Downtime**: Caused by issues within their own application, such as application errors or infrastructure failures. - **External Downtime**: Resulting from dependencies on third-party services, such as payment gateways or email providers. - **Handling External Downtime**: To counteract scenarios where payment gateways might go down, the team implemented a risk assessment system that allows the acceptance of low-risk orders even when the payment gateway is unavailable. This is managed through: - A **manual shutdown** system initially, which evolved into **automated processes** for efficiency. - A timeout mechanism that evaluates the risk before proceeding with order processing. - **Internal Downtime Solutions**: The presenters describe a fallback system, including: - **Chocolate**, a separate Sinatra application that acts as a request replayer when their main Rails application fails. This ensures that customer requests can still be stored and processed later without immediate disruptions. - **Akamai Dynamic Router**: This is utilized to reroute requests, minimizing the impact of application errors by handing off to the backup (Chocolate). - The use of a unique **request ID** to avoid duplicate charges and to manage orderly processing even when switching between applications. ### Significant Examples: - The first part of the talk delves into practical applications, illustrating how during high traffic periods, proactive measures were crucial in maintaining functionality and customer satisfaction, illustrating a situation with a high volume of transactions where they had automated processes in place to handle possible downtimes. ### Conclusions and Key Takeaways: - The presenters stress that all systems are prone to failure; hence, preparations should include having a robust failover strategy. - Implementing a replayer mechanism can significantly enhance user experience by ensuring operations continue smoothly during disruptions. - It's crucial to constantly evaluate and refine risk assessment models to appropriately manage order acceptance during downtimes. Ultimately, the speakers convey that while it is impossible to entirely eliminate downtime, thorough planning and intelligent design can have substantial positive impacts on system reliability and user satisfaction.
Suggest modifications
Cancel