Testing Rails at Scale

In the talk "Testing Rails at Scale" by Emil Stolarsky at RailsConf 2016, the speaker shares insights into the challenges and solutions Shopify faced in optimizing their Continuous Integration (CI) system. As a production engineer at Shopify, Emil recounts the journey of improving their CI processes due to inefficiencies that included lengthy build times and a lack of reliability which eroded developer trust.

Key points discussed include:

Understanding CI Components: Emil clarifies the two main components of CI systems—schedulers (like webhooks from GitHub that trigger builds) and compute resources (where code and tests are executed).
Initial CI Challenges: Shopify experienced significant problems with their previous CI provider that resulted in 20-minute build times and flakiness, making deployments unreliable and expensive due to underutilization of resources.
Transition to Buildkite: Shopify adopted Buildkite, an Australian CI provider that offers scheduling while enabling users to manage their compute resources. This hybrid model allowed Shopify to leverage its own infrastructure while simplifying the scheduling process.
Infrastructure Utilization: The compute cluster, mainly hosted on AWS, utilized 90 c4.large instances, allowing 5.4 terabytes of memory and over 3,200 cores for builds. Autoscaling and resource optimization strategies were developed to maintain costs and efficiency.
Enhancements through Docker: The integration of Docker for testing isolated environments proved to be a significant performance booster, reducing build times and increasing reliability. Docker containers allowed for immediate test execution upon startup.
Iterative Improvement: Emil describes the ongoing evolution of their CI system, focusing on enhancing stability and reducing the scope of failures originating from test flakiness. This included developing a more scalable version of their internal tool, Locutus, which uses a coordinator instance for work distribution.
Key Takeaways: The video concludes with crucial insights regarding CI systems: organizations with build times over fifteen minutes should consider building a custom solution, fully commit to migration, and manage infrastructure efficiently—adopting practices like treating infrastructure as ‘cattle’ rather than ‘pets’ to ease management burdens.

Overall, the talk encapsulates the hard lessons learned from constructing a responsive CI ecosystem capable of supporting Shopify’s scalable e-commerce platform.

This presentation not only highlights the technical challenges but also emphasizes the importance of reliability and efficiency in modern CI practices.

Testing Rails at Scale
Emil Stolarsky • May 26, 2016 • Kansas City, MO

Testing Rails at Scale by Emil Stolarsky

It's impossible to iterate quickly on a product without a reliable, responsive CI system. At a certain point, traditional CI providers don't cut it. Last summer, Shopify outgrew its CI solution and was plagued by 20 minute build times, flakiness, and waning trust from developers in CI statuses.

Now our new CI builds Shopify in under 5 minutes, 700 times a day, spinning up 30,000 docker containers in the process. This talk will cover the architectural decisions we made and the hard lessons we learned so you can design a similar build system to solve your own needs.

Help us caption & translate this video!

http://amara.org/v/J5Cl/

RailsConf 2016