Ruby on Ales 2015

Summarized using AI

Deployment Nirvana

Adrian Pike • March 06, 2015 • Earth

In his talk titled "Deployment Nirvana," Adrian Pike discusses the evolution of deployment practices, focusing on experiences from his time at Moz and the challenges of service-oriented architecture (SOA). He reminisces about the simplicity of earlier deployment methods, likening it to the ease of deploying applications with just a git push akin to Heroku's early days. However, as applications have become more complex, with numerous inter-service dependencies, the deployment process has also become more intricate and challenging.

Key points of the talk include:

- Shift to SOA: The transition from monolithic applications to microservices is driven by the need to manage cognitive load effectively among development teams. Smaller, independent services allow developers to work with well-defined API contracts.

- Complexities of Deployment: While breaking applications into services reduces cognitive overload, it introduces deployment complexities, including managing interdependencies, ensuring reliability, and maintaining smooth operations.

- Development Challenges: Adrian highlights the difficulty in coordinating multiple service updates and versioning, underlining the importance of ensuring all related services remain synchronized during deployments.

- Built Tools: He discusses tools developed at Moz, such as Circus, DMV, Seaport, and Viaduct. These tools helped enable zero-downtime deployments, manage service registries, and streamline the deployment process despite the complexities introduced by microservices.

- Open-source Project: Adrian shares his excitement about an open-source tool designed to facilitate smoother deployments that could be integrated with platforms like Heroku, emphasizing the need for deployment practices to evolve alongside the technologies and architectures used today.

In conclusion, Pike reiterates that while deployment tools and frameworks have improved significantly, ongoing efforts are necessary to simplify deployment processes, reduce complexity, and ensure operational reliability. A long-term strategy is vital to shield engineering teams from the complexities of deployment infrastructure, ultimately enhancing productivity and streamlining workflows.

Adrian's insights reflect a pivotal shift in engineering practices and underscore the continuous journey towards deployment Nirvana, advocating for methods that empower developers while mitigating the overhead of technical intricacies.

Deployment Nirvana
Adrian Pike • March 06, 2015 • Earth

By, Adrian Pike
Remember when Heroku showed up and how much it changed our world? Suddenly a simple `git push` and my app was online. Gosh, those were the good days, weren't they? Things have gotten a little more complicated for us these days. We've got inter-service dependencies. We should be doing some rolling deploys and user segmentation. Rollbacks should be instant and trivial. What about staging environments, why not be able to roll code safely? As an engineer, I should just be able to `git push`, and get back a running app instance. I should be able to route traffic to it whenever and however I want, whether it's production, staging, internal test, or just a running code spike on production servers to show a colleague. In this talk, I'm going to talk about the tools and infrastructure that I've built in the past to solve deployment woes for big, thorny, complicated apps, and give engineering teams tons of power. I'll also show off an open-source implementation that I'll be building specifically for RoA.

Help us caption & translate this video!

http://amara.org/v/GUQS/

Ruby on Ales 2015

00:00:29.760 We have a talk coming up. We're running a little bit behind, so I'm not going to introduce this man for too long. Except to say that he's the greatest Canadian who ever lived. He was actually born a mountain in Canada, which is not something that happens very often. He was born with his horse and sword.
00:00:36.719 His name is Adrian. He's from Seattle, and he works for a company called WellTalk. Let's see if he talks well. Please give a round of applause.
00:01:00.960 Hi, I'm sorry about that introduction. I'm actually from Seattle, not Canada, but it's pretty close. You know, it's pretty good.
00:01:07.439 My talk is about Deployment Nirvana, and it's about some of the issues, challenges, and lessons we learned at a company called Moz, where I built a lot of deployment infrastructure for our SOA app.
00:01:26.720 Conveniently enough, everyone this afternoon has been talking about services and why they are both amazing and terrible at the same time. So this has been a great lead-in.
00:01:39.040 The web, as we build it, is definitely changing. We're moving more from large, monolithic apps to splitting them up into services. This reflects back on the ways we factored out large enterprise apps in the 90s, but we’re now transforming them into small apps that communicate over HTTP, NSQ, or various message brokers.
00:02:06.240 This change isn't primarily about scaling performance; it's about managing cognitive load. People are expensive, and you can't simply throw more people at a problem. As teams grow, productivity doesn't necessarily increase. With our move to SOA or microservices, we have complicated software that used to be easy to deploy, now split into individual apps that we must deploy and coordinate separately.
00:02:30.800 This shift creates a lot of cognitive load that we need to consider. At Moz, I focused on how we could deploy 40 services in a way that simplifies life for developers. As an engineer, I don't want to worry about what happens when someone bumps a service and causes big issues. We have versioning, tests, and various ways to manage these problems.
00:03:00.879 I worked at Stride for a while, which was a startup I co-founded, and then I went to Moz, where I learned most of what I’ll discuss today. Now, I’m at a company called WellTalk, where I'm writing code and leading teams—basically doing everything we get paid for and enjoying it.
00:03:41.199 Let's rewind to about a year and a half ago at Moz. We aimed to build a glorious greenfield app. We had smart team members, including a couple of ex-Google engineers and brilliant fresh graduates, while I was just the anchor. Together, we tackled the challenge and developed a tool called Moz Local.
00:05:05.360 Moz Local primarily focuses on transforming API data. For instance, if I’m a small business owner, I want to ensure that my business listing is accurate across platforms like Yelp, Superpages, Foursquare (now Swarm), and Facebook. We had to work with numerous APIs and manage many concurrent connections, which made Node.js a convenient choice for building this because it allowed for tons of parallel I/O without major business logic.
00:05:54.080 Deciding to go with a large set of independent services made sense for both the app's specific needs and for team scaling. By breaking the app into smaller services, we could minimize cognitive load; developers only needed to interact with well-defined API contracts without requiring an understanding of the entire app.
00:06:31.280 This modular approach also focused on composability. We had a caching layer, and we could treat API interactions as a stream, allowing us to combine different services seamlessly.
00:06:53.760 But we also considered the complexities of service-oriented architecture (SOA). We ran into issues with interdependencies among services. For example, one service managed our data persistence, tied to various databases like Cassandra, Postgres, MySQL, and Redis. While this was great because it simplified my job, it complicated deployments due to reliability challenges.
00:07:30.320 Debugging became near impossible; when an exception occurred, I'd have to untangle the request's journey through numerous services. Tools like Foreman helped, but coordination remained difficult. Integration testing was painfully complicated, and deployment processes caused unexpected downtimes.
00:08:04.640 Despite these challenges, we decided to stick with service-oriented architecture. We threw ourselves into it.
00:08:28.960 We used a tool called Circus, a process manager for running various services, though it presented us with its own set of issues. Collateral flapping and deployment downtimes plagued us. We learned the importance of vetting tools prior to deployment to avoid core infrastructure failures.
00:09:01.560 With the collateral flapping issue, we weren't effectively capturing exceptions, resulting in multiple services crashing. We built a service registry and HTTP proxy that enabled us to run multiple backend instances. If one crashed, the system would seamlessly redirect requests to another instance without the end user noticing.
00:10:07.920 We noticed the deployment downtimes; by implementing our new service registry and HTTP proxy, we could start new backends, register them, and cleanly decompress the old ones.
00:10:26.079 Thus, we had developed a set of loosely coupled tools we called DMV for our proxy, Seaport for our port registry, and Viaduct for our new service registry and process manager.
00:10:46.239 These solutions protected us from the issues we encountered while coding bad software. We managed to achieve zero-downtime deployments and employed various controls to facilitate versioning and traffic management.
00:11:31.680 However, we faced challenges with client tooling. Our deployments relied on an intern who had to type out multiple commands that occasionally changed, leaving many unclear about the process. Development itself was cumbersome because we needed to coordinate several backend services, alongside PostgreSQL and Redis instances.
00:11:53.600 The core issue was inter-service versioning, especially as services like A and B would often change necessitating synchronized updates. We needed to track these variations and ensure they aligned in terms of requests and responses.
00:12:14.880 We also grappled with the challenge of achieving a synergy in service updates. If one services changed improperly or without coordinating the versioning between departments, we risked causing larger issues.
00:12:48.081 Our solution lay in building a release version that encapsulated the dependencies and interactions between services. Thus, we established mappings during deployment, ensuring robust communication on required versions.
00:13:31.600 This approach enabled us to link service versioning to actual Git SHA values. When we made commits in our version control, it provided a tangible reference tied to every deployment, facilitating better organization and accountability.
00:14:20.640 What's more, we could route requests to the specific versions via our custom HTTP proxy. This placed power in the engineers' hands, allowing us to deploy with clarity regarding what code interacts.
00:14:50.520 We realized that our deployments needed to reflect certain standards – a service has to recognize its specific SHA. This added a layer of complexity during deployment since various tools managed deployments similarly.
00:15:16.520 Moreover, we had to ensure that we operated with multiple instances of a service. This configuration ran successfully for our lightweight Node.js services, but memory-heavy applications presented challenges.
00:15:39.760 As we navigated these challenges, the concept of inter-service versioning emerged as an area not decisively solved across the industry. Teams often lacked communication, leading to unintended disruptions.
00:16:02.320 We learned to build a solid process to maintain synchronization across upgrades, to keep services functioning harmoniously.
00:16:20.360 Thus, we implemented methods to receive insights from our routing layer about how requests flowed through our microservices, allowing for more efficient debugging and monitoring over time.
00:17:04.000 Ultimately, we focused on minimizing the cognitive load on engineers. We knew we needed to build tools that didn’t overcomplicate deployments, and instead offered simple solutions regardless of infrastructure changes.
00:17:52.240 Recognizing that the deployment experience would benefit from reducing complexity, we aimed to implement streamlined updates. Making engineers push code without the added complications was our goal.
00:18:51.680 From our experiences, we clarified the need to minimize moving parts in our systems. The more elements involved in deployments means more opportunities for failure, so reducing redundancies enhances operational reliability.
00:20:28.880 We also recognized the importance of ensuring deployments remain modular, lending flexibility to the development cycle. Innovations frequently occur in the field, making it crucial for projects to adapt and evolve as needed.
00:20:59.840 Through this journey, I built an open-source project to encapsulate these advancements, designed to enhance deployment simplicity. It’s currently functioning with several of my ongoing side projects.
00:21:19.000 My ambition is to make this tool available on platforms like Heroku, allowing for hassle-free deployment from anywhere where code can be run.
00:21:53.680 With advanced tools at our disposal today, engineers should easily deploy their services without excessive hurdles. It’s 2015; we have the frameworks, the APIs, and the innovation to ensure deployments run smoothly.
00:22:31.760 In conclusion, I emphasize that deployments must continue to evolve. As we encounter new challenges, it’s essential to adopt a long-term strategy for handling the complexities of service-oriented architecture.
00:23:11.120 We need to ensure our engineering teams are shielded from unnecessary complexity in deployment infrastructure. Minimizing the friction in both development and operations makes for an efficient workflow.
00:23:53.200 When we make deploying easier, we embrace the opportunities that come with streamlined, effective practices. The tools have improved so much, but we still have work ahead!
00:24:33.440 Thank you.
00:30:23.360 Enjoy!.
Explore all talks recorded at Ruby on Ales 2015
+9