00:00:29.760
We have a talk coming up. We're running a little bit behind, so I'm not going to introduce this man for too long. Except to say that he's the greatest Canadian who ever lived. He was actually born a mountain in Canada, which is not something that happens very often. He was born with his horse and sword.
00:00:36.719
His name is Adrian. He's from Seattle, and he works for a company called WellTalk. Let's see if he talks well. Please give a round of applause.
00:01:00.960
Hi, I'm sorry about that introduction. I'm actually from Seattle, not Canada, but it's pretty close. You know, it's pretty good.
00:01:07.439
My talk is about Deployment Nirvana, and it's about some of the issues, challenges, and lessons we learned at a company called Moz, where I built a lot of deployment infrastructure for our SOA app.
00:01:26.720
Conveniently enough, everyone this afternoon has been talking about services and why they are both amazing and terrible at the same time. So this has been a great lead-in.
00:01:39.040
The web, as we build it, is definitely changing. We're moving more from large, monolithic apps to splitting them up into services. This reflects back on the ways we factored out large enterprise apps in the 90s, but we’re now transforming them into small apps that communicate over HTTP, NSQ, or various message brokers.
00:02:06.240
This change isn't primarily about scaling performance; it's about managing cognitive load. People are expensive, and you can't simply throw more people at a problem. As teams grow, productivity doesn't necessarily increase. With our move to SOA or microservices, we have complicated software that used to be easy to deploy, now split into individual apps that we must deploy and coordinate separately.
00:02:30.800
This shift creates a lot of cognitive load that we need to consider. At Moz, I focused on how we could deploy 40 services in a way that simplifies life for developers. As an engineer, I don't want to worry about what happens when someone bumps a service and causes big issues. We have versioning, tests, and various ways to manage these problems.
00:03:00.879
I worked at Stride for a while, which was a startup I co-founded, and then I went to Moz, where I learned most of what I’ll discuss today. Now, I’m at a company called WellTalk, where I'm writing code and leading teams—basically doing everything we get paid for and enjoying it.
00:03:41.199
Let's rewind to about a year and a half ago at Moz. We aimed to build a glorious greenfield app. We had smart team members, including a couple of ex-Google engineers and brilliant fresh graduates, while I was just the anchor. Together, we tackled the challenge and developed a tool called Moz Local.
00:05:05.360
Moz Local primarily focuses on transforming API data. For instance, if I’m a small business owner, I want to ensure that my business listing is accurate across platforms like Yelp, Superpages, Foursquare (now Swarm), and Facebook. We had to work with numerous APIs and manage many concurrent connections, which made Node.js a convenient choice for building this because it allowed for tons of parallel I/O without major business logic.
00:05:54.080
Deciding to go with a large set of independent services made sense for both the app's specific needs and for team scaling. By breaking the app into smaller services, we could minimize cognitive load; developers only needed to interact with well-defined API contracts without requiring an understanding of the entire app.
00:06:31.280
This modular approach also focused on composability. We had a caching layer, and we could treat API interactions as a stream, allowing us to combine different services seamlessly.
00:06:53.760
But we also considered the complexities of service-oriented architecture (SOA). We ran into issues with interdependencies among services. For example, one service managed our data persistence, tied to various databases like Cassandra, Postgres, MySQL, and Redis. While this was great because it simplified my job, it complicated deployments due to reliability challenges.
00:07:30.320
Debugging became near impossible; when an exception occurred, I'd have to untangle the request's journey through numerous services. Tools like Foreman helped, but coordination remained difficult. Integration testing was painfully complicated, and deployment processes caused unexpected downtimes.
00:08:04.640
Despite these challenges, we decided to stick with service-oriented architecture. We threw ourselves into it.
00:08:28.960
We used a tool called Circus, a process manager for running various services, though it presented us with its own set of issues. Collateral flapping and deployment downtimes plagued us. We learned the importance of vetting tools prior to deployment to avoid core infrastructure failures.
00:09:01.560
With the collateral flapping issue, we weren't effectively capturing exceptions, resulting in multiple services crashing. We built a service registry and HTTP proxy that enabled us to run multiple backend instances. If one crashed, the system would seamlessly redirect requests to another instance without the end user noticing.
00:10:07.920
We noticed the deployment downtimes; by implementing our new service registry and HTTP proxy, we could start new backends, register them, and cleanly decompress the old ones.
00:10:26.079
Thus, we had developed a set of loosely coupled tools we called DMV for our proxy, Seaport for our port registry, and Viaduct for our new service registry and process manager.
00:10:46.239
These solutions protected us from the issues we encountered while coding bad software. We managed to achieve zero-downtime deployments and employed various controls to facilitate versioning and traffic management.
00:11:31.680
However, we faced challenges with client tooling. Our deployments relied on an intern who had to type out multiple commands that occasionally changed, leaving many unclear about the process. Development itself was cumbersome because we needed to coordinate several backend services, alongside PostgreSQL and Redis instances.
00:11:53.600
The core issue was inter-service versioning, especially as services like A and B would often change necessitating synchronized updates. We needed to track these variations and ensure they aligned in terms of requests and responses.
00:12:14.880
We also grappled with the challenge of achieving a synergy in service updates. If one services changed improperly or without coordinating the versioning between departments, we risked causing larger issues.
00:12:48.081
Our solution lay in building a release version that encapsulated the dependencies and interactions between services. Thus, we established mappings during deployment, ensuring robust communication on required versions.
00:13:31.600
This approach enabled us to link service versioning to actual Git SHA values. When we made commits in our version control, it provided a tangible reference tied to every deployment, facilitating better organization and accountability.
00:14:20.640
What's more, we could route requests to the specific versions via our custom HTTP proxy. This placed power in the engineers' hands, allowing us to deploy with clarity regarding what code interacts.
00:14:50.520
We realized that our deployments needed to reflect certain standards – a service has to recognize its specific SHA. This added a layer of complexity during deployment since various tools managed deployments similarly.
00:15:16.520
Moreover, we had to ensure that we operated with multiple instances of a service. This configuration ran successfully for our lightweight Node.js services, but memory-heavy applications presented challenges.
00:15:39.760
As we navigated these challenges, the concept of inter-service versioning emerged as an area not decisively solved across the industry. Teams often lacked communication, leading to unintended disruptions.
00:16:02.320
We learned to build a solid process to maintain synchronization across upgrades, to keep services functioning harmoniously.
00:16:20.360
Thus, we implemented methods to receive insights from our routing layer about how requests flowed through our microservices, allowing for more efficient debugging and monitoring over time.
00:17:04.000
Ultimately, we focused on minimizing the cognitive load on engineers. We knew we needed to build tools that didn’t overcomplicate deployments, and instead offered simple solutions regardless of infrastructure changes.
00:17:52.240
Recognizing that the deployment experience would benefit from reducing complexity, we aimed to implement streamlined updates. Making engineers push code without the added complications was our goal.
00:18:51.680
From our experiences, we clarified the need to minimize moving parts in our systems. The more elements involved in deployments means more opportunities for failure, so reducing redundancies enhances operational reliability.
00:20:28.880
We also recognized the importance of ensuring deployments remain modular, lending flexibility to the development cycle. Innovations frequently occur in the field, making it crucial for projects to adapt and evolve as needed.
00:20:59.840
Through this journey, I built an open-source project to encapsulate these advancements, designed to enhance deployment simplicity. It’s currently functioning with several of my ongoing side projects.
00:21:19.000
My ambition is to make this tool available on platforms like Heroku, allowing for hassle-free deployment from anywhere where code can be run.
00:21:53.680
With advanced tools at our disposal today, engineers should easily deploy their services without excessive hurdles. It’s 2015; we have the frameworks, the APIs, and the innovation to ensure deployments run smoothly.
00:22:31.760
In conclusion, I emphasize that deployments must continue to evolve. As we encounter new challenges, it’s essential to adopt a long-term strategy for handling the complexities of service-oriented architecture.
00:23:11.120
We need to ensure our engineering teams are shielded from unnecessary complexity in deployment infrastructure. Minimizing the friction in both development and operations makes for an efficient workflow.
00:23:53.200
When we make deploying easier, we embrace the opportunities that come with streamlined, effective practices. The tools have improved so much, but we still have work ahead!
00:24:33.440
Thank you.
00:30:23.360
Enjoy!.