Talks

How We Deploy Shopify

by Kat Drobnjakovic

In this presentation, Kat Drobnjakovic, a developer at Shopify, discusses the deployment process for Shopify, one of the largest Rails applications globally, capable of processing over 25,000 requests per second. The main topic revolves around the deployment mechanisms that ensure reliability and scalability as the platform serves more than 275,000 businesses. Key points covered include:

  • Overview of Shopify: Introduced as a platform that empowers online commerce, Shopify facilitates more than $17 billion in sales.
  • Deployment Tool: The open-source tool 'Ship It' streamlines deployment by allowing developers to submit pull requests, which can be merged into the master branch for deployment.
  • Deployment Process: The deployment is initiated by pressing a button that triggers a Capistrano script, which manages the transition of new code to the server. Visual representations on the deployment screen help monitor progress.
  • High Frequency of Deployment: Shopify deploys about 30 times daily, with a record of 41 in one day, enabling rapid updates and fixes.
  • Exit Status Management: The process includes checking exit statuses to ensure successful deployments, with actions taken to resolve issues in case of failures.
  • Handling Traffic Spikes: The robust process is crucial in accommodating high traffic from large clients during critical events such as product launches and sales.
  • Load Balancers and Shipment: Incoming requests are managed by load balancers to distribute traffic to the relevant servers, ensuring that even during transitions, requests are handled appropriately.
  • Data Center Management: Shopify operates two data centers for active traffic management and failover procedures, ensuring continuous service even during deployment issues.

In conclusion, the deployment process at Shopify is characterized by its efficiency, reliability, and ability to handle a significant load while maintaining system integrity. Kat's insights highlight the importance of a well-structured deployment strategy in supporting a massive and rapidly-changing e-commerce platform.

00:00:09.710 Hi everyone, my name is Kat, and I'm excited to talk to you today about how we deploy Shopify.
00:00:13.740 Just a little bit about me: I'm a developer at Shopify, and actually, today marks my one-year anniversary here, which is quite cool! I started as an intern one year ago, and now I have the opportunity to speak to all of you at RailsConf.
00:00:23.189 It's a bit surreal, but I'm really excited to dive into the topic. To start, let me give you a brief overview of Shopify. Our mission is to make commerce better for everyone. This includes empowering online stores, providing a point-of-sale system, and enabling 'buy' buttons on various platforms around the internet.
00:00:37.739 To put things into perspective, over 275,000 businesses use Shopify, generating more than $17 billion in sales. That’s a hefty amount running through our platform! To support this, Shopify can handle 25,000 requests per second, which is crucial for accommodating the traffic during events like flash sales or holiday shopping.
00:01:06.600 We have some high-profile clients who drive significant traffic to our servers, such as Kanye West and Kylie Jenner. Shopify is one of the largest Rails applications in the world, having been around for over a decade.
00:01:15.030 With the amount of code we deploy daily, it's important to have a reliable deployment process. We use a tool called Ship It, which is open-source and available on GitHub. Ship It allows developers to make pull requests, and when they merge their changes to master, it shows up in the deploy process.
00:01:35.700 Once all the necessary checks pass, developers can deploy their code. The deployment screen gives a visual representation of the process and details about the deployment. You'll see boxes labeled S and V that correspond to our servers—'SV' stands for Shopify Borg. There's about 200 of these servers, and each one contains five containers.
00:02:21.880 The green dot on the screen indicates that a container has completed the deployment, while a blue dot shows that it's currently switching revisions. The empty circles represent containers that have not yet been updated. This setup allows us to monitor the entire deployment visually, where a success message is displayed once the deployment is complete.
00:02:58.360 On average, we deploy Shopify 30 times a day, with our record being 41 deployments in one day! This frequency of deployment is made possible because anyone on the team who makes a change can submit a pull request and deploy, not just developers. Each deployment takes about four minutes, which is fast and exciting, but can also be a bit scary.
00:03:26.890 Today, I’ll cover what happens when you press the blue button to deploy, as well as what occurs when users make requests during that process. Deploying Shopify is primarily about pressing that button and monitoring outcomes. Pressing the deploy button initializes a Capistrano script. This script runs and the first step involves taking the new revision and placing it on the host server.
00:03:58.450 The supervisor daemons are then started, and their first responsibility is to take the new code written in the revision file and start all the containers. Each container is brought up sequentially; the first container is started, and once it completes its process, the next container begins. During deployment, adjustments are made based on exit statuses which indicate if the deployment was successful.
00:04:47.230 There are three different exit statuses: a successful exit code indicates that the deployment was completed successfully and that all containers have transitioned to the new revision. If all containers are not yet transitioned, the supervisor daemons will continue restarting the containers until each has switched to the new version.
00:05:19.920 However, if a deployment fails, there are two potential outcomes: either the revision is flapping, or the deployment has simply failed. A flapping revision means that some containers are running the new version while others are stuck on the old one. This situation is undesirable, as it can lead to inconsistency in application behavior.
00:06:07.520 When a deployment fails, the application is restarted to resolve the issue. We have numerous large merchants, such as Kylie Jenner, who drive considerable traffic during sales events. Each time a popular product is released or promoted, the amount of traffic can spike significantly, necessitating a robust deployment process.
00:06:39.900 When users make requests—like wanting Kylie Jenner's lipsticks—they are sent through the load balancers to the respective servers. If the server is in the middle of switching revisions, it might not accept the incoming request, leading to retries sent to different containers. If a container handles the request, it will complete it before switching its revision.
00:07:09.520 Although we can handle multiple requests, it’s critical to ensure that all requests are served correctly. With several containers involved, if one is switching and another experiences a failure, this can lead to a situation where some revisions are outdated, hence impacting customer experience.
00:08:11.710 If we consider a scenario where a deploy hasn’t finished, and someone deploys a revision while it’s in progress, it can lead to the situation where multiple revisions are in play on the same server. Such cases negatively affect performance and can lead to a flopping deployment.
00:09:40.060 The critical difference between a deploy and a restart is ease of management and complexity. Deploying involves a new revision, while restarting utilizes the current one. Both processes restart the supervisor daemons; however, deployments are slightly longer due to the downloading of a new image.
00:10:54.290 We have two data centers—one is active and handles traffic, while the other is passive, which we keep updated with the latest revisions. This is important for failover, ensuring we can switch over to the passive data center in case of issues without losing the current state of the application.
00:12:02.620 If a deployment fails in the active data center, we take immediate action because that's where our traffic flows. We usually restart the application and monitor for any issues.
00:12:10.150 Thank you for your time today! That concludes my talk on how we deploy Shopify.