The Many Ways to Deploy Continuously

00:00:20.760 All right, folks. I'm going to give a talk about the many ways to deploy continuously.

00:00:24.080 As was pointed out, Allan was supposed to give this talk, but rather selfishly he went off to Italy on his honeymoon. Fortunately, I know the material pretty well because Allan and I are co-founders of a company called CircleCI. CircleCI provides hosted continuous integration and deployment for Ruby apps, Node.js, and similar types of applications.

00:00:34.399 We have a couple of thousand developers as our paying customers, and about 42% of them are doing continuous deployment. This gives us the opportunity to observe a lot of different approaches to continuous deployment. We also spend a significant amount of time looking at what the future of our product will be by examining what larger companies like Facebook and GitHub are doing.

00:00:44.840 This talk is an overview of what we've learned through our research and by talking with our customers. Most talks you attend tend to provide answers, but this is not one of those talks. I’m not here to tell you the right way to implement continuous deployment, as I've learned that there is no one-size-fits-all solution.

00:00:57.320 Everyone we talk to does continuous deployment slightly differently. Some companies only deploy continuously during business hours, from Monday to Friday, but avoid deploying on Friday afternoons. Others might only deploy at 2 a.m. because that’s the only time they can afford to clear their cache.

00:01:04.960 If you're planning to adopt continuous deployment at your company, there’s likely a specific way you think it should be done. However, as you observe other companies, you'll see that they each implement it in their own unique manner.

00:01:11.760 What I'm aiming to do in this talk is not provide you with answers but examine the various factors that come into play with continuous deployment. These factors include the speed at which you're developing new features, the complexity of your code, the design and architecture of your software, and whether you follow a service-oriented approach or use a monolithic app structure.

00:01:24.760 Business priorities, the number of engineers on your team, and your overall state of mind will also impact how continuous deployment is implemented.

00:01:34.919 Most of the talk will focus on deployment in general, as continuous deployment is only a subset of all deployments. Many of the challenges you face in deployment won’t feel like problems if you only deploy monthly; you can afford a minute of downtime and still maintain reliable service. However, as the frequency of deployments increases—say, deploying 10 to 500 times a day—these issues become much more relevant.

00:01:42.120 There are numerous complexities involved with deployment. In the early days, people often deployed PHP applications by using FTP to a shared server.

00:01:55.160 The fundamental aspect of deployment is that your code lives on one machine, and you want to transfer it to a server so the code can run. A significant challenge arises during the transition period when files are being overwritten, leading to requests that may rely on both new and old code simultaneously.

00:02:10.840 This race condition occurs during every deployment; requests can be made while the old code is still running, resulting in unexpected behaviors if some files are replaced before others.

00:02:19.240 In the PHP days, advanced users implemented symlink strategies. They would upload new code into a separate directory and only switch the symlink to the new directory once all files were uploaded, minimizing the risks of race conditions.

00:02:25.560 However, race conditions can arise in various aspects beyond just code changes; they can happen with database schemas, API versions, and other service dependencies.

00:02:36.320 Let’s explore a more modern approach to deployment, particularly using platforms like Heroku, which also faces similar issues to those experienced during the early PHP days.

00:02:46.079 In Heroku, when new code is pushed, user requests can either hit old versions or newly deployed code because the transition between the two still presents challenges.

00:02:56.880 Moreover, when changes to database schemas occur, we must ensure that the new code can handle both the old and the new schema until we've completed the migration.

00:03:05.760 A practical solution to these challenges is to deploy an intermediate version of your app that understands both schemas and can work with either during the transition period.

00:03:17.040 If you do this successfully, you can run your migration smoothly, which will help you avoid potential pitfalls associated with a more significant shift.

00:03:30.160 Moving on, two tricky topics in deployments are data migrations and table locking. When changing your schema, database tables can become locked, leading to performance issues and downtime that can be quite detrimental to business operations.

00:03:44.300 For instance, IMVU, a pioneer in continuous deployment, faced significant downtime whenever they changed their database schema. To address this, they implemented a versioning system for their user tables, creating new tables whenever schema changes were needed, thus avoiding table locks.

00:03:56.680 They never modified existing tables directly, ensuring they could continuously serve users while migrating data without locking issues.

00:04:09.500 Another option for schema changes is to utilize databases that do not require table locking, like MongoDB, which handles schema migrations in a more flexible manner.

00:04:22.040 Now, let's shift our focus to how Facebook handles its deployment processes. While they don't practice continuous deployment in the same way as some companies, they do deploy once a day and have sophisticated mechanisms in place to manage their large volumes of data.

00:04:34.080 Facebook has effectively solved similar problems to those faced by IMVU, allowing them to migrate data gradually without significant downtime.

00:04:48.200 Deploying updates can often lead to racing conditions where both old and new data coexist, demanding a codebase that can handle the variability in data structures.

00:05:02.160 With the advent of one-page JavaScript applications, we must also deal with the additional problem of asset races. When deploying new code, it may not align perfectly with API availability if the deployment doesn’t synchronize.

00:05:12.840 This issue compounds itself, as the assets may rely on features that the backend APIs haven't deployed effectively yet, causing further complications in user experiences.

00:05:24.480 To combat this, the best practice is to roll out new code before the code that depends on it, thus ensuring that the dependent features are functional at the time users encounter them.

00:05:43.760 As we discuss continuous deployment, it's important to recognize that despite the obstacles encountered with regular deployments, they are magnified exponentially under continuous deployment conditions.

00:05:57.640 Monitoring and testing become key components in managing these complex challenges. Testing can be difficult, as it's hard to anticipate every combination of versions that may be running concurrently once deployed.

00:06:07.760 One approach is to conduct tests privately on staging servers before going live, but GitHub has taken it a step further by ensuring that all code merged into master must have been tested in the production environment first.

00:06:20.000 This practice ensures that their master branch is always ready for deployment and tested against a small subset of real users, adding a layer of safety.

00:06:32.960 Similarly, Facebook deploys all code to customers but uses feature flags to disable new features until they’re ready for wider release, allowing for gradual rollout and testing.

00:06:47.840 Feature flags can be finely tuned to allow specific user groups access, enabling them to monitor responses closely before activating for a broader audience.

00:07:00.480 In summary, the complementary strategies of testing and monitoring work together to address the challenges that arise from continuous deployment. This holistic approach can help you manage risks and improve reliability in production.

00:07:15.360 It’s vital to look beyond traditional monitoring techniques to embrace broader metrics that help gauge user engagement and business performance.

00:07:20.880 This can include factors like conversion rates and user interactions, allowing you to automatically roll back changes should metrics fall below expected thresholds.

00:07:28.000 For instance, IMVU monitored business metrics closely, using their insights to identify when confusion arose due to poor UX decisions, such as using a white buy button on a white background that users couldn’t see.

00:07:41.000 They quickly integrated business performance as a metric for their deployment success, allowing them to revert changes swiftly when needed.

00:07:53.600 To conclude, I hope this discussion has provided insights into various deployment strategies and the unique challenges they present. Continuous deployment doesn’t have to be daunting; rather, it opens new paths for efficiency and adaptation.

00:08:04.720 If this topic excites you, CircleCI is hiring! You can find us online at jobs.circleci.com.

00:08:07.920 I would love to hear from anyone with additional insights on continuous deployment strategies or any efficient transport mechanisms.

00:08:12.200 Thank you very much for listening. I appreciate your time and attention, and I hope you enjoyed exploring the many ways to deploy continuously.

00:08:21.300 Excellent.