Carson: On the Path from Big-Ball-of-Mud to SOA

00:00:14.960 Aloha! My name is James, and I work at Zendesk. I am a Senior User Happiness Engineer.

00:00:20.369 You can find me on Twitter as @JamesARosen and on GitHub as JamesLeeRosen. You can find me pretty much everywhere under @JamesARosen. The 'A' is important because James Rosen is a Fox News contributor, and I am not.

00:00:31.410 I'm here to talk to you about Carson. You can also find these slides on GitHub along with most of the content written out for review later.

00:00:41.730 About five years ago, Zendesk was born as a Rails 1.2 application. It was a pretty standard application. Over the years, we've upgraded it many times, sometimes slightly less successfully than others. We've also added a lot of non-Rails services. We started out with Sphinx for search, but we switched that out for Solr after a while. We added a gem called Aged Birdie for chat, then switched that to Node. We also added Rescue for background jobs. Over the years, the codebase has grown significantly.

00:01:06.750 The red line represents test coverage, while the blue line represents the code count. You can see that our engineers are pretty adept at keeping our test coverage up, which is great. However, over time, with a single application growing, developer happiness starts high but tends to decline.

00:01:19.350 There are two primary reasons for this developer unhappiness: test time and interdependencies, or dependency management. The first thing we tried to address was the test time problem. Corey Haines had a fantastic talk about a year and a half ago at GuruRuCo, where he discussed achieving a fast test suite. He demonstrated what I like to call the holy grail of rapid testing, running an entire Rails app test suite in just three-tenths of a second. It was incredible!

00:01:40.049 What he did was extract his code from its dependencies on slow elements, such as the database and Rails itself. This allowed him to run his tests in a very pure, isolated Ruby environment. This change not only improved testing but also encouraged better coding practices over time. If you adopt a fast, test-driven methodology, you'll end up with well-structured code that is easy to test and, consequently, easy to maintain for clients.

00:02:10.090 We implemented this approach for a while, and while it worked well, it wasn't enough. We had too many lines of code, and we couldn't migrate all our test suites to fast tests. We were still spending up to an hour for a single test suite and just a minute or two simply to boot up the Rails process, which left us still struggling with developer happiness.

00:02:33.220 We learned the hard way about Conway's law: organizations that design systems are constrained to produce designs that mirror the communication structures of these organizations. What we discovered was that for faster tests and individual components, we needed teams that reflected that structure. If you want to break up your Rails app, you should break up your teams.

00:02:54.380 Our first project to reflect this was called Sea Monster, a CMS built as a classic Rails 2 engine. It contained HTML pages and had very little API activity, JavaScript, or custom styling. We simply bootstrapped it on our existing infrastructure, such as our admin interface. We developed this as a Rails 2 engine, which allowed the team to operate fairly independently. It has been running successfully in production ever since.

00:03:11.220 What we learned, however, was that Rails 2 is not very adept at asset management for engines. This was rectified in Rails 3 with the asset pipeline, which handles engines brilliantly. When you drop in stylesheets, JavaScript, or images, they're easily referenced in your Rails app. In Rails 2, you had to run a Rake task to manage these assets, which turned out to be quite a hassle. When we began developing our next project, the Zendesk App Marketplace, we decided it was time to upgrade to Rails 3.

00:03:41.730 This application was heavily focused on JavaScript; it was essentially an API with JavaScript clients, featuring very few HTML pages to render. Although Rails was a good fit, we were aiming for the API-centric version. So we built a proof of concept of Sindbis Apps and once that was completed, I approached our head of operations, whom I'll refer to as Zaphod—though that’s not their real name.

00:04:06.470 I said, 'Hey Zaphod, we have this nifty new Rails 3 app ready to launch. How can we get it running in production? What can I do to help?' Zaphod looked at me as if I had asked the wrong question entirely. He said, 'We're not ready to provision, monitor, and deploy an entire new Rails 3 stack. Our current Rails 2 stack is working for us.' This is where I may have misjudged Zaphod's stress level.

00:04:32.630 I argued back, 'But Zaphod, Rails 3 is the future! Multiple apps and service-oriented architecture are the way forward; everything's going to be great if we implement it!' He was not convinced, which led to meetings and discussions brainstorming ways forward, ultimately resulting in the birth of Carson.

00:05:03.110 The key takeaway here is to communicate and build rapport with your operations team. So what exactly is Carson? To put it in simplistic terms, Carson represents baby steps towards a service-oriented architecture (SOA). The infrastructure team and I agreed we could have one Rails 2 app and one Rails 3 app, but we would keep it limited to that, avoiding a proliferation of Rails apps.

00:05:26.629 Thus, Carson is a Rails 3 app that does not function as a traditional app. It mainly consists of configuration, a Gemfile, and some integration tests. To build a vertical feature within Zendesk, you create a Carson engine. For instance, the provisioning team develops a provisioning engine, and the email management team creates an email management engine. Vertical features warrant the building of engines, while cross-cutting concerns like logging or database internationalization are managed as classic gems.

00:05:50.429 The only code incorporated into Carson consists of initializers, meant to set up the environment. Did this approach work? In some respects, yes! We now have smaller test suites across multiple projects, which run very quickly, and the team feels good about their configuration.

00:06:10.270 One additional benefit is that each project receives CI build failure notifications only for their respective projects. This significantly reduces noise across teams, which is a plus. However, we aren't running our tests in a real environment; they run in a somewhat isolated and fake environment, which necessitates some integration testing.

00:06:30.000 This need for integration tests has been highlighted by engines unintentionally interfering with each other. I'll cover how to avoid that later. But a few integration tests go a long way. The next benefit we achieved is semantic versioning paired with decoupled deployment schedules.

00:06:54.150 Perhaps you've been in a situation where you've developed a feature, merged it into master, and were excited for it to deploy with the next release on Thursday. Then, inevitably, someone else merges in a bug, and QA holds up the deployment because of their conflicting code. Now, your code cannot be released because it clashes with someone else’s updates.

00:07:11.909 Carson solves this issue by building engines as separate gems, each of which is semantically versioned. You can merge your code into master, bump the version, and redeploy Carson with your updated version at will, as other gems are locked to their respective current versions. This approach helps eliminate coupling in deployment schedules; everyone can work at their own pace.

00:07:28.539 Whenever you need patch updates, maintaining semantic versioning simplifies the process. For easy patch updates, all you need to do is run 'bundle update my engine.' If you employ the tilde operator, you'll receive only patch updates for that engine, which can be deployed at will as they're designed not to include any functional changes.

00:07:48.599 However, managing this effectively can be a bit tricky without an internal gem server. Building an internal gem server is crucial if you're going to follow these baby steps well. Without one, you'll find yourself struggling with numerous GET URLs or hashes in your Gemfile, spending hours each day just managing these dependencies.

00:08:03.630 The holy grail would be to establish an environment where our continuous integration and continuous deployment server can run all patch updates and redeploy every fifteen minutes. This way, all engines would receive patch deployments seamlessly, and for major version bumps requiring new functionality, those can be handled manually.

00:08:31.620 Dependency management is fun! Ironically, this talk was almost titled 'Carson: Because Dependency Management is Fun.' When working with isolated teams that need to coordinate, especially when operating within a single runtime, you'll find more meetings surrounding dependencies.

00:08:49.330 In a classic version, you would have these same meetings; you just might not notice them. You would enter your Gemfile, update the gem, and hope nothing breaks. Ideally, you'd run your tests, which would provide some level of assurance that no critical failures emerged.

00:09:06.810 The advantage here is about gem specifications that tell you who relies on what versions and, to a limited extent, why. This means it’s a bit harder to break other people's code by simply updating a dependency. For instance, if we update JSON, and the provisioning team cannot work with the updated JSON version, it won’t bundle properly, preventing deployment.

00:09:23.690 While increased meetings might become necessary—and more maintenance will be required around your Gemfile and gem specifications—setting up an internal gem server can alleviate some of this burden. Another benefit is streamlined deployments.

00:09:40.050 For example, you might be deploying a classic monolithic Rails app using Capistrano. In Carson, whenever a new project begins, they get free deployment capabilities; they don’t need to build their deployment mechanisms from scratch. They merely need to add their functionality to the Gemfile and get deployed, just like everyone else.

00:10:03.110 Another advantage is that you can incorporate as many features as the runtime allows. However, it does mean that you can't scale these features independently. For example, if one service receives a hundred requests daily while another processes a million, they'll be operating within the same VM, forcing you to scale for the largest load.

00:10:26.430 One downside is that you're limited to a single, central database. Essentially, you're dealing with a single app server, which cannot handle multiple databases. While there are some Rails Active Record add-ons that can manage multiple databases, it can be quite tricky. This centralization presents challenges when working toward a true service-oriented architecture.

00:10:47.199 A compelling essay from Thunderbolt Labs discusses the drawbacks of having a single central database, focusing mainly on issues concerning schema ownership. When multiple services interact with the same tables or data, it becomes difficult to ascertain who has authority over what changes at any given time.

00:11:13.350 Now I want to cover some patterns and anti-patterns related to this work. The first is namespacing. Each engine should be given a name, and for example, the provisioning engine receives a database table prefix and Ruby prefix, along with internationalization keys, asset routes, and API endpoint routes. This makes it clear who owns which part of functionality.

00:11:36.650 An anti-pattern to watch for is unexpected global state. We encountered issues early on with plenty of global state existing in our Rails app—even before adding any engines. It's easy to introduce global state through Ruby classes as global state or class-level methods.

00:11:58.450 Before examining your code, you can see global state throughout Rails. For example, you may have I18n locale, backend translation data, database schema, rack middleware, mime type mappings, and so forth. It's not difficult to stumble upon global state in your Rails app.

00:12:27.420 Here's an issue we faced in our codebase: the provisioning engine had an initializer that set up a new internationalization backend for itself, adding its translation files to the existing set. However, this clashed with our Zendesk apps engine, which meant the translations would be overwritten during deployments.

00:12:46.100 To resolve this, the solution is to agree upon a convention and encode that convention into a gem. Simply agreeing isn't enough, as someone is likely to forget. Thus, it's paramount to write code that articulates your agreement. Each engine that relies on that convention should include it, and any future changes to the convention will necessitate updating all involved gems.

00:13:05.150 The convention we established was that each engine could register its own components, but couldn't overwrite the global components. They could only request inclusion in the list. We've encapsulated this as Zendesk's I18n gem, and each engine initializes its own internationalization processes through it.

00:13:28.450 A bonus takeaway is to leverage ActiveSupport notifications. If you're using Rails 3, this framework is incredibly helpful. It allows for cross-domain notifications— not in terms of HTTP, but within your own modeled domain, enabling effective communication.

00:13:44.570 Every time an account is updated, we can notify listeners interested in those changes, like caching or other services that care about accounts. Even better is the ability to announce very domain-specific events. Instead of just notifying that an account was modified, announce when an SSL certificate is updated.

00:14:00.350 This lets services focused on SSL certificates handle those changes without having to parse an entire notification message and filter out irrelevant information. If there is one paramount takeaway, it is to avoid shared mutable objects.

00:14:12.130 For those who attended the earlier event demonstration, one common theme was that shared mutable objects can lead to entirely preventable issues for your fellow developers. It's all too easy to step on one another's toes.

00:14:27.500 About a year ago, Reg Braithwaite published a thought-provoking article called 'William's Master of the Come From,' which discussed a former coworker who imbued an incredibly clever method of dependency injection and decoupling.

00:14:50.010 In a typical Rails app, you have relationships, such as a person having many comments and comments belonging to a person. However, in William's Rails app, he would analyze which parts depended on others. The person module could exist independently, but not so the commenting module without the person module.

00:15:08.400 The initial thought here would be that if the commenting engine knows about people, it could lead to more intertwined dependencies. However, this is a slippery slope, as it can easily set up a denial-of-service attack against developers if any monkey patches, validations, or callbacks that become necessary aren't handled correctly.

00:15:29.960 The core principle is: If it's not your data and not your code, then you don't get to change it. One effective solution we've identified is creating copies of model classes. These classes are based on the same database table but remain read-only.

00:15:51.700 In this setup, your commenting engine might refer to a version of a person class designed specifically for its use, titled 'Authors.' This copy utilizes the same database structure, but one cannot alter or write data back to the parent’s table. This means you can read and create relationships without risk of inadvertently altering another engine's data or code.

00:16:10.060 Let's take this one step further and explore dependencies with engines. Engines depend on the fact that they run against the same database, as long as we're within the Carson framework. It all works fine when we rely on consistent primary keys from parent tables.

00:16:28.050 If your engine's dependencies start to include additional columns from a table, you’ll encounter instability. Modifications from the original owner of the engine could alter its schema at any time, creating unpredictability.

00:16:43.080 One way to improve upon this is for the primary engine to declare a public API accessible to all other engines. This API might provide what is called a 'lookup' method which generates instances of the required entity, allowing other engines to depend on this API, rather than querying the base data directly.

00:17:03.790 This allows you to change the underlying implementation of that lookup method without any repercussions for other engines—maintaining their flexibility. You can evolve the implementation without forcing other developers to alter their code as long as they stay within the API's contract.

00:17:19.870 The end goal is ultimately to ensure your engine doesn’t become an overly complex engine that directly depends on others. It allows you to maintain tighter control over code ownership and data integrity.

00:17:31.760 Another valuable addition is semantic versioning, which makes collaboration across teams so much easier. If you decide to alter your API, notify your respective clients; they will have to adjust their code accordingly. In doing so, you maintain a tighter grip on your dependencies.

00:17:55.820 In terms of future direction, while Carson may sometimes feel like another big ball of mud, there are distinct benefits that arise when making it your stepping stone to SOA.

00:18:02.750 It’s vital to highlight that while it may not address every concern, Carson presents a systematic approach towards decoupling your applications and addressing the problems that arise from tightly coupled architectures.

00:18:19.290 If your current architecture resembles a big ball of mud, use this talk as a conceptual tool to ponder your own journey. Although I’m not necessarily urging everyone to dive into this method, I do encourage reflection on moving past tightly coupled structures.

00:18:41.470 There are certain trade-offs surrounding dependency management and coordination to consider—not just in Carson-like environments but also in fully realized SOA environments. Identifying growth paths along these lines can prove invaluable, as prioritizing tools for effective deployment and well-structured testing builds the right foundations for future productivity.

00:19:12.230 It's easier for a team to migrate to Carson than it is to extract complete services independently. However, it's crucial not to remain stagnant within Carson if it prevents progress towards true SOA.

00:19:25.920 Cautiously ensure you are not sidelined as technological advancement awaits you. On the plus side, Carson has provided better developer speeds and greater deployability despite being a temporary arrangement.

00:19:37.590 At Zendesk, we are currently hiring and have turfed a plethora of additional reading material for anyone interested. I would now like to open the floor for questions regarding Carson and its facets.

00:19:55.080 In terms of database migration, Rails 3 handles this smoothly. Each engine can write its own migrations, and in Carson, you would execute 'rake your_engine_name migration:install', which would copy those migrations into the host application for execution.

00:20:14.930 At Zendesk, we’ve encountered challenges with this due to managing both Rails 2 and Rails 3 applications, making it tricky to assign responsibility for migrations. We’ve settled that they all reside in Rails 2 for now. If you hold a substantial application in Rails 3, migration becomes a smoother process.

00:20:43.700 For those still operating on Rails 2, manual copying of migrations is often necessary. I believe someone has created a gem that backports this functionality from Rails 3, but I would have to look that up if you’re interested.

00:21:01.190 So, regarding how to determine which components to decouple into separate services, I recommend allowing your tests to guide you. Identify areas that are problematic to test; this will offer insights into what clients might expect from those services.

00:21:22.880 A quality API, when tested effectively, will naturally carry those standards into other clients. Focus on offloading complex tasks or those that could afford to be deferred, such as background processes or actions not needing immediate client interaction.

00:21:42.220 At Zendesk, we acknowledge three significant user categories: end users, agents, and admins, each with different usage patterns and performance requirements. Identifying opportunities for service decoupling drawn from this understanding aids effective project scope.

00:22:03.610 Currently, we have a handful of engines in Rails 3, mainly comprising three or four features, with around 30 to 50 in Rails 2, based on feature breakdown. Our ongoing strategy involves upgrading the classic app to Rails 3 and merging functionalities.

00:22:25.190 The flexibility of engines enables any to form their standalone Rails services when necessary. In Rails 3, engines are effectively treated as apps, and transitioning an engine to a Rails app simply involves adding a config folder, which is rather seamless.

00:22:48.280 Alternatively, if engines require sustained communication between applications, adapting a service layer for notifications can ameliorate the disconnect across different VMs.

00:23:04.790 If there are no more questions, I believe I've covered the main points. Thank you for your attention!