I've Made a Huge Mistake: We Did Services All Wrong

GORUCO 2018: I've Made a Huge Mistake: We Did Services All Wrong by Kelly Sutton

GoRuCo 2018

00:00:15.080 Great! Hey, thanks, Luke, and also thank you to all of the organizers of GoRuCo. This is actually my first time speaking at GoRuCo, and it's my first time giving a programming-related talk, so thanks for having me! Maybe this is why the conference is ending—people like me showing up. They usually say it's hard to follow a great talk like Andy's. It's even harder to follow a talk when you're desperately dabbing the tears out of your eyes.

00:00:31.349 My talk is titled "I've Made a Huge Mistake: How We Got Services Wrong and What We Learned Along the Way." This topic has been presented many times, in various flavors, so this will be our take on it. I will tell the story of how a growing startup tried to break apart its Rails monolith and share some of our learnings along the way.

00:01:35.000 The talk will be broken down into four sections: a little bit about me, some high-level discussion around how to break down the Rails monolith, when the right time to do that is, and I will end with five concrete tips. Then, we'll wrap things up.

00:01:54.000 My name is Kelly, and I live in San Francisco, where I've been for three years. Prior to that, I was living here in New York. I'm glad to discuss New York versus San Francisco at the after party! I live with my fiancée and our dog. We're getting married next summer, and we're having a Twin Peaks-themed wedding.

00:02:03.799 I've been writing Rails since about 2005, and I work at a company called Gusto, which provides payroll, benefits, and HR software for small businesses in the U.S. We'll get into that a little more, but right now, you're probably wondering... let's talk about Gretta, our dog! I could spend ten minutes talking about her, but I'll just take a moment to share.

00:02:31.130 Greta is a very fashionable dog. She likes to keep up with the latest trends. She recently decided to redo her hairstyle with an ombre coloring! She's also a very worldly dog; she enjoys traveling and having new experiences. She recently brought back a kimono from Japan. However, that's not why we're here—we are here to talk about Rails projects that push the limits of the framework.

00:03:00.139 I think the notion that 'Rails doesn't scale' is one of the great lies circulating among commenters on Hacker News. There are times when you may need to reach for different tools or approaches, and that's what this talk is about. I want to start with a caveat: you shouldn't take everything I present here and apply it immediately to your day job. It’s important to think about it and discuss it with your team.

00:03:22.370 So, let's talk about Gusto. Currently, Gusto handles payroll for 1% of small businesses in the United States, moving over 1 billion dollars per month. Our Rails codebase is one of the largest I've worked with, consisting of about a million and a half lines of code between the front-end and back-end, including our tests. We have over 80 engineers, and we mostly work out of the same model.

00:03:40.880 We do all of this with the Fisher-Price framework that is Rails. One and a half million lines of code is not a small project, especially when you have a language as expressive as Ruby. The domains of payroll, benefits, and HR come with a lot of incidental complexity—this is complexity that is part of the problem and not our fault.

00:04:01.459 Payroll is a careful balance of time, geography, money, and people. Tax jurisdictions can be as small as a city block or as large as an entire country. The IRS pays close attention, and even being off by a single cent can lead to serious issues. Obviously, everyone wants to be paid on time. Unfortunately, to avoid double payment, we have to implement idempotent jobs, which is a must. Our software must serve real people, and making mistakes is human. I'm probably making a big mistake right now! This talk is about mistakes.

00:04:35.330 When given a crucial trade-off in application development, we always prioritize correctness over performance. It's vital for everything to work correctly—the IRS needs it, and so does the country. This is one of the trade-offs we make, which is an important consideration when you look into domain-driven design.

00:04:54.950 Let’s discuss how things can go wrong and when projects can become too large. If you're at work and find that 'our Rails monolith is huge,' congratulations! That’s a great problem to have. You have a company that is valuable enough to keep around.

00:05:02.880 Many things that work when you're just five engineers in someone’s apartment start to break down when you scale to 50, 80, or even hundreds of engineers. I want to discuss the concept of the 'swamp,' which refers to a monolith that becomes so large that it is very slow to develop. Startup times could take tens of seconds, and your test suites might take tens of minutes or even hours.

00:05:31.540 You may find yourself playing whack-a-mole with different parts of the application. Ship one feature, and another breaks. Fix that feature, and then something else fails. You start to outgrow some of the paradigms that Rails offers out of the box, like the app models folder or app services folder.

00:05:56.669 Over time, it becomes difficult to make any changes at all. You might notice that your team is growing, but it feels like you’re moving slower. It can feel like having your head stuck in a Kleenex box.

00:06:12.130 Let’s talk about what Gusto's swamp looks like. We're still in the process of untangling our Rails monolith. Many people will get up here and say, 'We switched to a service-oriented architecture; everything was great, and we're done!' But as we've learned, this is a process—a dance.

00:06:37.700 Our swamp has four rough domains: payroll, benefits, HR, and infrastructure. What they specifically do isn't too important, but this is the swamp we operate in. There are 666 models—don't read too much into that number! While we were working, someone might return from a conference and say let's extract a service from this massive swamp we’ve built. You might say, cool, let’s see how that might work.

00:07:44.500 In our case, our team decided to extract the HR domain into its own service, its own application. We chose to do this because HR is conceptually different from everything else but still related to our business operations. For example, in the context of Gusto, you might keep someone’s name, Social Security number, and pay in an HR service, while the payroll service is responsible for processing their payment. So we thought, let's create a new Rails app—HR V2 with Rails 5, getting really crazy here!

00:08:18.780 As we went about doing this, we discovered there was a lot to tackle in HR that we hadn’t realized. It felt like the goalposts kept moving away from us, and we were not isolated; we had to collaborate with our team: product managers and designers, while the business asked, 'When is that thing going to be done?' Over time, the team started to get burnt out.

00:09:02.790 Despite these challenges, we moved forward, thinking, 'We’ve done this well; HR V2 has 90% of the features transferred, so let’s call the project done.' However, that decision presented a different problem: rather than having one access point for HR information, we now had two sources to retrieve that information from.

00:09:28.310 There's HR V2, which was moving quickly but beginning to slow down due to the volume of information it had to manage. At the same time, we had the appendage of old legacy HR, which was still very important to our operations. This is where tribal knowledge comes into play: when someone asks where a name or Social Security number is stored, you now have to ask someone rather than being able to find that structure in the code easily.

00:09:51.550 There has to be a better way. We feel we are on the right track, but we by no means have all the answers. I want to distinguish between applications and services, a topic that Scott covered well. That said, the terminology may differ slightly, but the spirit remains the same.

00:10:18.880 You might wonder if applications and services are essentially the same thing. Here are the distinctions we make: an application is something with its own process, its own app, like creating a new Rails project. A service, by contrast, may just be a module of code, potentially running in-process or out of process. You might send parameters to it via method calls or use something modern like gRPC.

00:10:59.670 Typically, applications have their own database, whereas services often share a database with your monolith. Applications can scale independently, while services will scale with the host app. Applications might be built in another language, while services will generally share the same language.

00:11:10.550 Our rule is to create a service first before building an application. This means that 'rails new' is one of the last commands we run in the process. Let’s return to our swamp.

00:11:42.550 The first step is to sit down with the teams responsible for these domains, such as payroll and HR. We need to clarify conceptually what payroll and HR represent, what they should be doing, and compare our expectations against the actual application structure. We want everyone onboard with the vision that there is a benefit to breaking these components apart.

00:12:17.259 We found that it's much easier to maintain applications when they are split. Even basic tasks, like routing pages, become significantly easier. Once we identify the contours or bounded context, we explicitly draw edges between these domains. Moving away from the database as the default method of communication in a traditional Rails application leads us to define the interactions clearly.

00:12:54.000 Every time payroll needs to communicate with HR, or vice versa, we want it to be an explicit operation, ensuring that we're not simply reading from the same database table. When traversing these edges, we use value objects instead of passing around an ActiveRecord instance. As we work, these interfaces solidify, clarifying the interaction between the two domains.

00:13:15.860 As time passes, decisions about whether these services should run on their own servers or as part of a fleet of containers become simpler. Although this might seem like a small change, the ability to configure and reconfigure them as needed introduces new failure modes. However, if changing a transport layer is an easy operation, we feel confident that we've defined the boundaries of our domains correctly.

00:13:51.199 In this year's RailsConf, DHH talked about 'conceptual compression,' a hallmark of Rails' design. Rails allows you to focus on building your application without getting bogged down in detail. Even now, it remains one of the most expressive ways to create a web application. While Ruby optimizes for developer happiness, the framework also aims to minimize developer time and provides a lot of functionality out of the box, so you don't have to write complex SQL statements or handle data validation yourself.

00:14:42.670 Nonetheless, as your application grows and you dive deeper into your domains, it's imperative to identify which Rails components need to be broken apart. This leads to the question of how to categorize the four or five responsibilities handled by ActiveRecord. How do you begin to dismantle those concepts and select which of Rails' capabilities you want to retain?

00:15:00.680 The next five concrete examples illustrate strategies we've found effective. Bear in mind that these methods may not work for everyone, so it's essential to discuss with your team if they make sense. The first recommendation is to avoid circular dependencies. Even though Ruby lacks explicit import and export statements, we still need to treat every line of code as part of a larger dependency graph.

00:15:44.750 As we develop our Rails applications, we actively critique bi-directional relationships. For instance, when modeling an employee, we question whether we need a reverse link back to a company. Often, it’s sufficient to know that employees exist within a given company, which helps to eliminate circular dependencies—these are the elements that contribute to a 'ball of mud' that complicates code base management.

00:16:08.830 Second, when communicating between services, we recommend using value objects for data traversal. Let's examine a simple service class that handles post-sign-up processes for companies.

00:16:35.960 This service responds to a new company signing up by sending a welcome email and tracking some statistics. The code looks quite normal for a small application where speed takes precedence over maintainability. However, examining it more closely reveals that we’ve tightly coupled the functionality of the company mailer and the stats tracker to the structure of the company ActiveRecord. This coupling introduces a challenge: any change to the structure of the Company's model would require adjustments in several places, resulting in what is often referred to as 'shotgun surgery.'

00:17:03.950 To mitigate this, a useful exercise is to question when it's appropriate to abandon ActiveRecord in favor of pure values or value objects. Sometimes this could simply involve using something as basic as an integer or string, while at other times, it necessitates creating Plain Old Ruby Objects (POROs) to encapsulate the data we need to pass around. This way, any changes to the structure of our Company ActiveRecord only need to be made in a single class.

00:17:42.090 The third recommendation is to avoid callbacks where possible. Callbacks are incredibly powerful in Rails, and it can feel nearly impossible to write a model that does anything interesting without relying on them. However, their expressiveness comes at a cost.

00:18:02.500 Consider a scenario where we want to send a welcome email right after a company signs up. This process introduces additional layers of dependency between services in a system. This coupling occurs between model logic and the mailer service, which can create cyclical dependencies.

00:18:36.830 Instead, we advocate for the creation of service objects. Since we handle one and a half million lines of code, we need a clear structure. Using composable service objects allows us to decouple the responsibilities from the model object. In this model, rather than binding the company model directly with the mailer, we create a separate service object that manages the creation of the company and takes care of sending the email afterward.

00:18:56.700 This restructuring introduces a new node in our dependency graph rather than creating a cycle, so the trade-off we make is that adding nodes to clarify dependencies is better than allowing circular references. These cycles complicate future changes.

00:19:40.200 The fourth recommendation is to think services first and then applications. It's essential to delineate boundaries around sections of your app and harden them before you execute that 'rails new' command. When you do run that command, ensure everything from the previous service is thoroughly cleared out.

00:20:00.250 Finally, the fifth recommendation is to move slowly. Transforming and extracting components from the monolith takes time and effort. For one application piece we worked on, it took six months and over 500 pull requests.

00:20:23.490 We set a vision for where we wanted to reach without having a definitive roadmap for how to get there or how long it would take. However, we gained the buy-in from our team and the company, which facilitated a smooth transition without disruption. Throughout this transformation, we continued to conduct every payroll transaction without issue.

00:20:47.860 In summary, always proceed incrementally. Regardless of how bad the existing code may be, 'Rails new'ing a microservice is like undertaking an extensive rewrite without appreciating it. You have to work with your current setup rather than ambitiously building something new from scratch.

00:21:04.900 In the words of Kent Beck, always seek to make the hard changes easier. Breaking apart a Rails monolith can be challenging, and you may need to unlearn practices you’ve followed for years. However, remember to make the simple changes.

00:21:13.250 You cannot accomplish this without a solid team. It’s not something that you can simply break down into a series of stories or points; it will naturally take time. Trust your team, and good luck! Thank you.