5 Years of Rails Scaling to 80k RPS

In his 2017 RailsConf talk, Simon Eskildsen discusses the evolution of Shopify's Rails infrastructure over five years, culminating in the ability to scale the platform to handle 80,000 requests per second (RPS) during peak times. He reflects on key milestones from 2012 to 2016 that enabled this growth, focusing on strategic decisions and optimizations that shaped the architecture of Shopify.

Key Points Discussed:

Shopify's Scaling Journey: Initially, the team realized its infrastructure needed to support massive traffic surges, particularly during 'flash sales,' which posed challenges due to high demand.
Background Jobs Optimization (2012): Transitioning from synchronous to asynchronous processing for checkout processes and inventory management was pivotal, alleviating long-running requests and improving response times.
Load Testing Introduction: The creation of a load-testing tool allowed the team to simulate user checkout scenarios and assess performance improvements in real-time, establishing a culture of continuous performance validation.
Identity Cache Implementation: To reduce database load, they implemented an identity cache system, balancing data freshness and cache efficiency amidst heavy request traffic.
Sharding for Flexibility (2013): Sharding was introduced to isolate data for individual shops, allowing better management of read/write operations and preventing interference between different stores navigating data-intensive environments.
Resiliency Strategies (2014): As the infrastructure expanded, the team focused on identifying and mitigating single points of failure to ensure system reliability and reduce the cascading effects of failures across components.
Multi-Data Center Strategy (2015): To enhance reliability, Shopify transitioned to a multi-data center architecture for failover capability, enabling seamless traffic routing without service disruption.
Current Metrics (2016): Ultimately, the platform achieved the capability to handle 80,000 RPS across its multiple data centers, processing substantial sales traffic while running efficiently.

Conclusions and Takeaways:
- The evolution involved recognizing the need for not just performance but also resilience in infrastructure.
- Lessons learned reflect ongoing integrations of new technologies, adaptations to existing processes, and an emphasis on outcomes rather than just technical specifications. Shopify's commitment to scalability and technology foresight highlights its position as a robust platform for e-commerce under pressure.
- Continued collaboration and knowledge transfer among engineering teams are essential to manage and innovate the platform effectively, ensuring readiness for future demands.

5 Years of Rails Scaling to 80k RPS
Simon Eskildsen • May 17, 2017 • Phoenix, AZ

RailsConf 2017: 5 Years of Rails Scaling to 80k RPS by Simon Eskildsen

Shopify has taken Rails through some of the world's largest sales: Superbowl, Celebrity Launches, and Black Friday. In this talk, we will go through the evolution of the Shopify infrastructure: from re-architecting and caching in 2012, sharding in 2013, and reducing the blast radius of every point of failure in 2014. To 2016, where we accomplished running our 325,000+ stores out of multiple datacenters. It'll be whirlwind tour of the lessons learned scaling one of the world's largest Rails deployments for half a decade.

RailsConf 2017