Talks

Services and Rails: The Shit They Don't Tell You

This video was recorded on http://wrocloverb.com. You should follow us at https://twitter.com/wrocloverb. See you next year!

Building services and integrating them into Rails is hard. We want smaller Rails apps and nicely encapsulated services, but services introduce complexity. If you go overboard in the beginning, you're doing extra work and getting some of it wrong. If you wait too long, you've got a mess.

At Yammer, we constantly clean up the mess that worked well in the early days, but has become troublesome to maintain and scale. We pull things out of the core Rails app, stand them up on their own, and make sure they work well and are fast. With 20+ services, we've learned some lessons along the way. Services that seem clean in the beginning can turn into development environment nightmares. Temporary double-dispatching solutions turn into developer confusion. Monitoring one app turns into monitoring a suite of apps and handling failure between them.

This talk looks at our mistakes and solutions, the tradeoffs, and how we're able to keep moving quickly. Having services and a smaller Rails codebase makes for scalable development teams, happier engineers, and predictable production environments. Getting there is full of hard decisions -- sometimes we're right, sometimes we fuck it up, but we usually have a story to tell.

wroc_love.rb 2013

00:00:18.480 Today, I'm going to be talking about how Yammer builds services and how we integrate them alongside Rails. I want to discuss some of the things that we don't talk about very often.
00:00:30.960 My name is Brian, and I work on the Rails team at Yammer. One of the things we do is take components out of our monolithic Rails app and put them into services. The Rails team assists the core services in this integration. On a personal note, I love Zelda, video games, and music. I'm really excited to be here; this is my first time in Europe. I gave my first conference talk just a couple of days ago, so I am a bit scared, but this is awesome!
00:00:56.760 I have a pet peeve about talk titles that don’t connect to the content. So, I want to ensure that I keep my talk relevant to the title. Additionally, I needed to find a suitable template for my presentation. Since Yammer is owned by Microsoft, there are tons of templates available. I came across one that mentioned how full bed photos can evoke emotions and make presentations more memorable, so I decided to use that idea.
00:01:10.240 Before continuing, I want to highlight that this talk might not apply to everyone here yet. If you're building a startup to determine viability, you should ignore much of this advice and focus on getting it done. Building services adds a layer of complexity, and you probably don't have enough information yet to start this process.
00:01:20.360 Once you determine that you need to scale, however, you will have to make some decisions that might make you uncomfortable. I will discuss those hard decisions in this talk. At Yammer, we have a large Rails app with over 300 models and 200 controllers. It is backed by 20+ JVM services that we have built, and some of these services handle over a billion requests a day.
00:01:40.000 Even with our powerful services, the Rails app remains quite large, which makes things challenging as we move forward. We have managed to address some issues by extracting components into services and tackling the problem incrementally. Still, things get painful when dealing with tasks like sharding or upgrading Rails and Ruby. These transitions are all-or-nothing situations.
00:02:10.000 In this talk, I will touch on service-oriented architectures, the reasons we build services, and the benefits they provide. Our primary goal is to create components that can scale individually. By focusing on small services, we gain versatility and ease of use.
00:02:32.960 To illustrate this, at Yammer, we developed our Rails app alongside specific services, including indexing services and a search interface. Based on previous search experiences, we recognized the need to handle these components in three chunks to ensure individual scalability and reusability.
00:02:55.600 We created a denormalized data store that mirrors data from Rails, allowing various surface layers to utilize it efficiently. This architecture facilitates the addition of features, such as autocomplete for our search interface, which can be seamlessly integrated by leveraging existing components. Our data export service, named Slurpee, allows us to pull entire chunks of data from this denormalized store and export them conveniently.
00:03:16.560 Having multiple independent components provides significant advantages, particularly in scalability. It’s much easier to scale a service since we can predict its usage and performance patterns. Moreover, we avoid wasting resources on scaling the entire system if those resources are not required.
00:03:45.679 Another pivotal benefit of our services is loose coupling. Smaller, focused services allow us to encapsulate concerns and push out updates independently without necessitating massive deployments of the entire Rails app. While we still have large deployments, the ability to move smaller pieces along independently proves beneficial.
00:04:06.280 Currently, we are transitioning our files backend. This has been a smooth process as we shift away from our Node.js service towards a JVM-based solution without creating chaos in our operations. This emphasis on adaptability allows us the flexibility needed to swap technologies whilst maintaining stability.
00:04:30.000 Organizational scalability is another goal of service-oriented architecture. When working on a monolithic application, developers need to understand the entire codebase, which often leads to confusion and mixed responsibilities. By breaking down the application into services, we allow teams to gain the knowledge they need on a need-to-know basis, fostering distributed execution.
00:04:55.360 This loose coupling enables our teams to collaborate effectively, allowing them to agree on API standards. We can utilize dummy endpoints while building services in parallel, allowing different teams to work simultaneously towards a cohesive product.
00:05:15.360 However, moving to a service-oriented architecture does not happen overnight. Starting your business or application with a monolithic approach can offer an easy way to build and modify things quickly. With a single codebase, there is little overhead, and you can share code easily without facing the complexities that arise with services.
00:05:33.760 While these lessons are learned iteratively, forming a service-oriented architecture requires a mindset shift. At Yammer, we often discuss Conway's Law, which posits that organizations tend to develop systems that reflect their communication structure. This means that if we want to create services, we need to align our development teams accordingly.
00:05:56.080 Many organizations divide their departments vertically or horizontally, leading to silos that inhibit collaboration. At Yammer, we initially structured our teams with separation and specialization, but we found that this siloed knowledge limited our efficiency. As we grew, we realized it was essential to reorganize.
00:06:16.320 We shifted our approach to create cross-functional teams. For each new service or feature developed, we assemble a diverse team with representatives from every aspect of the project, such as Rails engineers, core services engineers, and mobile client engineers. This decentralized process fosters an environment where teams are autonomous and well-informed.
00:06:35.000 These teams have complete discretion in their design and development processes, producing well-structured and reusable systems. Importantly, these teams are ephemeral—they come together, solve a problem, and then disband, with a different lead taking charge on subsequent projects.
00:06:57.120 Cross-functional teams allow us to leverage shared knowledge across various domains, ensuring that we can work on countless projects simultaneously. They coordinate effectively between client and service APIs, leading to the emergence of naturally fitting services.
00:07:16.560 That said, these teams also face challenges. Without siloed experts, there are trade-offs, including the need for team members to continuously adapt to new domains. When a team forms, they need to learn and familiarize themselves with the project area, which can be time-intensive.
00:07:41.919 Additionally, while building features, we must be cautious not to couple the API implementation too tightly with client needs. It's easy to focus on immediate feature demands instead of looking at the broader picture. After project completion, the support burden is shared amongst other teams, which may not be as familiar with the domain.
00:08:00.480 There are various ways we set up our structures and processes. Although we have changed our organizational setup recently, solving these complex problems remains an ongoing effort. Often, the simplest solution involves placing all services behind Rails, preventing clients from interacting with them directly to mitigate complications.
00:08:21.360 This allows our services to maintain separate data stores without requiring clients to navigate multiple layers. However, there are trade-offs involved in maintaining Rails resources to access these services.
00:08:43.360 In some scenarios, we allow direct client interactions with services, which can simplify things. For example, our Mugshot service dynamically resizes images for us, streamlining caching and image management. Nevertheless, as we architect these systems, we must also factor in authentication and security concerns.
00:09:02.280 As our application grows, challenges emerge with managing data across services. Reading from the database can be manageable, but writing can introduce complexity. The difficulty arises because our caching mechanisms and data management structures need to align seamlessly between Rails and the services.
00:09:22.520 Active record is our primary means of managing data, which allows us to build our app quickly. However, as we untangle our data structures, we encounter challenges in maintaining data integrity, connecting models, and ensuring consistent validations across our services.
00:09:42.400 To mitigate these issues, we often treat our services as indexes, storing IDs in our services and hydrating relationships through Rails. Another option is to shift ownership of the data to the service layer, which can improve management but also leads to data duplication as we go.
00:10:00.680 Once we've determined we want to move away from Active Record, it’s necessary to plan this transition carefully. Downtime is not an option for us, so we have to implement incremental changes without disrupting service availability.
00:10:21.960 One effective approach is double dispatching: we backfill all data to services while also writing to the database simultaneously. During this phase, we can monitor performance and adjust traffic flow gradually, which ensures that we can analyze capacity handling before fully transitioning.
00:10:42.000 However, the dual data storage leads to yet another challenge: who manages the cleanup? Developers can become confused seeing data replicated across systems, leading to uncertainty around which data is the source of truth. Regularly, developers need to address data conflicts and unnecessary duplication.
00:11:02.280 Adopting a backup plan is a comfort zone that can be problematic. While it allows developers to revert to previous structures if a service fails, it can also hinder progress. We need to be comfortable with stepping out of our comfort zone and tackling new challenges.
00:11:19.360 However, maintaining a backup plan has its costs, as it requires additional resources and can create overhead that might not be worth it in the long run. Developers must focus on developing and delivering value, ensuring they don’t revert to monolithic solutions as a knee-jerk response to complexity.
00:11:36.440 At Yammer, we discuss the importance of learning from our mistakes. We have rewritten our search stack three times, but each iteration has taught us invaluable lessons, leading to a robust and efficient stack today. Similar reflections apply to our code: we often look back at past decisions and reconsider if those were the best choices.
00:11:56.020 This re-evaluation process is essential; assumptions about prior decisions may no longer apply in the current development landscape. While we maintain a large Rails app, our focus is to continually move closer to a more scalable and manageable structure.
00:12:16.920 In closing, any questions?
00:12:32.919 So you mentioned that you work in teams; how long do those teams typically last?
00:12:38.360 Our teams generally consist of 2 to 10 people, and we operate on a 2-to-10-week cycle. Beyond ten weeks, projects tend to drag on unproductively.
00:12:50.320 What happens if the team encounters support issues after a project ends?
00:12:58.320 We have another cross-functional team dedicated to support engineering. Their role is to handle support issues, though it can be challenging as they might not be familiar with specific domain knowledge. However, they know where to go for answers.
00:13:12.440 If your application is well-factored and modularized, what remains the issue with it?
00:13:30.679 Even a well-structured application can present challenges, particularly in complexity when services are introduced. The nature of your application greatly influences this aspect.
00:13:46.799 How do you manage communication and documentation when various teams work on the project?
00:14:01.280 When a cross-functional team is formed, we prepare a text document to outline the changes. This documentation is continuously updated throughout the project. At its conclusion, we have a repository of insights detailing challenges and decisions made.
00:14:21.600 Do you centralize documentation for all services?
00:14:38.200 We use a combination of written documentation, much of which resides in Google Docs. While it isn’t formalized, it serves us well for maintaining notes on projects.
00:14:52.100 When deciding on code extraction for service development, do you follow a specific protocol?
00:15:05.680 Historically, we have not implemented strict protocols, but that doesn’t preclude us from adopting improved procedures in the future.
00:15:26.160 To conclude, do you use something like OpenAPI for API documentation?
00:15:43.280 Our documentation is fairly loose, primarily consisting of notes taken throughout the project. While it might not be formal, it allows us to maintain agility and adapt as we proceed.
00:15:59.999 Thank you all for participating!