00:00:20.480
My talk, as you said, is called "Services and Rails: The Shit They Don't Tell You." Today, we're going to discuss how Yammer builds services alongside Rails and explore some things that are often overlooked.
00:00:25.680
This is me—my name is Brian, and I work on the Rails team at Yammer. One of the things that the Rails team does is help extract chunks of functionality from our Rails codebase and integrate those into services.
00:00:32.640
I also love Zelda, music, Ruby, and, of course, Yammer. Now, maybe I can talk closer to the microphone; does that help?
00:00:57.280
Check... good? Awesome! So, first, I have a little pet peeve about the titles of talks, and I want to address that. I don't like when talks have a title that doesn't relate to the content, so I'm keen to ensure my talk includes the 'they don't tell you' gimmick.
00:01:09.760
As mentioned earlier, this is my first conference talk. Just a week ago, I was nervous but excited about presenting. While preparing, I thought back to the last time I had spoken in front of a class and recalled that one important aspect back then was choosing the right template for my talk.
00:01:32.720
Now, Yammer is part of Microsoft, which provides us with many templates. I found a slide in one of them that said, 'Full bleed photos can set a mood or evoke emotion, making for a more memorable presentation.' I thought this was valuable, so I included it in my presentation. This slide allowed me to connect my pet peeve about talk titles to our discussion.
00:01:57.360
It illustrates that every time there's a point we don't often discuss. This character will frequently be on screen to help highlight those moments.
00:02:30.800
Now, regarding this topic: This might not apply to you yet. However, if you are building a startup to determine viability, you might want to push some of this aside and focus on getting things done. Introducing a lot of complexity can be counterproductive at this stage.
00:02:41.280
Nonetheless, it doesn't mean you shouldn't write clean, well-designed code. It's just that, at this point, focusing on services might be a distraction. If you do have clean code, extracting it into services will be much easier.
00:02:52.959
As you begin to build services and scale, you must do some things that may feel uncomfortable.
00:03:04.720
At Yammer, we have a massive Rails application with over 300 models, some of which are substantial. Additionally, we have more than 200 controllers and significant content in our lib directory. Our system is backed up by over 20 JVM services we have built, with some of these services handling over a billion requests daily. Thus, we maintain a pretty substantial ecosystem.
00:03:21.599
However, we still deal with this massive Rails app, and the challenges intensify as we proceed. For a long time, this structure has worked for us, and we've managed to chip away at the problem by progressively building services, although it often feels painful.
00:03:39.520
However, tasks like sharding or updating Rails or Ruby turn into all-or-nothing projects, which can be quite difficult.
00:03:52.720
In this talk, we will discuss Service-Oriented Architectures: why we build services and what benefits they provide us.
00:04:03.439
One of our fundamental goals is to develop components that can scale independently. When you have small, focused services, they are more versatile and enable easier reusability.
00:04:16.799
Let’s talk more about reusability. We have our Rails app and several other services in our infrastructure. This example is based on some of our prior experience with search stacks. We were able to separate different components of our search stack.
00:04:28.560
In the middle is Flattery, which serves as a denormalized data store in Rails. We have hooks in Rails that publish data into Flattery, allowing us to store a denormalized representation. Dexi, depicted to the right, is our service that builds Lucid indexes from transaction streams sent from Flattery.
00:04:51.681
Ultimately, we built our search interface in this manner. When we wanted to incorporate a new search service into the stack, it was relatively easy for us. We developed an autocomplete service called Completey, which integrated directly into the existing pipeline.
00:05:14.560
We already had the necessary components in place and clean interfaces defined. The same principle applies to exporting data, where we have a service called Slurpee. Part of this pipeline already existed, and we didn't have to pull data from Rails again.
00:05:28.960
That's fantastic because now we have many independent components that can scale individually. This setup allows us to better understand each service's specific needs and performance patterns.
00:05:41.280
We can assess these metrics and avoid unnecessary resource allocation across the entire stack.
00:05:59.440
This flexible architecture is enabled by the loose coupling of components. We have encapsulated many of our concerns within smaller, more focused services.
00:06:10.160
We can independently push updates to each service without necessitating large-scale deployments. The Rails app still demands mass deployments; however, we can release smaller pieces of infrastructure in various ways.
00:06:27.360
Most importantly, we can swap out infrastructure components without needing to notify anyone, as long as we maintain consistent interface standards.
00:06:39.360
Currently, we are in the process of changing our files backend. It's been somewhat painful, requiring more adjustments than anticipated, likely due to some initial missteps.
00:06:52.080
Largely, we have the ability to replace this entire service with a new one.
00:07:10.560
The second goal of adopting service-oriented architectures is to create maintainable codebases across the organization. If you've worked with a monolithic application, you have probably encountered challenges like stepping on each other's toes or needing to retain vast amounts of application knowledge.
00:07:27.200
With services, you navigate through numerous 'black boxes' dedicated to specific tasks. You can learn about these services as required on a case-by-case basis.
00:07:45.440
This advancement gives us a concept of distributed execution enabled by the loose coupling of services.
00:08:00.320
This division isn't just in a computational sense but also applies to development. We split our codebases, which makes it easier to appoint a team to a specific service. That team can coordinate and establish how these components will interact with each other.
00:08:15.680
We often create dummy components at the initial stages, integrating some data on the Rails side while developing a service that accepts this data without yet performing any operations.
00:08:34.080
When the project progresses and approaches completion, we establish a full end-to-end test. This allows team members to unlock each other and incrementally develop the service.
00:08:51.440
Our discussion may seem complex; the journey of transitioning from a single codebase to distributed services is filled with challenges.
00:09:03.280
When you're an early-stage startup, managing this level of complexity can hinder your ability to ship products quickly. With a unified codebase, you have the freedom to implement changes swiftly.
00:09:17.760
You can interact with your data layers directly, share code easily, and evade some overhead associated with managing services. Although some challenges may resurface later, swift progress is vital.
00:09:32.560
We discovered at Yammer that moving to this distributed structure often requires organizational change.
00:09:46.560
At Yammer, we frequently discuss Conway's Law, which states that organizations designed to avoid bottlenecks will produce systems that reflect the communication structures of those organizations.
00:10:07.360
It's important to consider how our development teams are structured when contemplating services. Many organizations divide their departments vertically or horizontally, leading to silos.
00:10:19.920
This siloing can result in team members becoming overly attached to their responsibilities, which ultimately inhibits communication and decision-making.
00:10:34.080
In the early days at Yammer, our messaging team managed the service that handled all message feeds. The team was responsible for both the service and the Rails side of the implementation.
00:10:51.440
They decided on the interface and how to implement it, leading to siloed knowledge about the entire system—something that wasn’t necessarily the best use of their time.
00:11:07.200
As we considered scaling to accommodate future needs, we had to determine how to strategically manage feature teams.
00:11:24.320
To improve our organizational structure, we adopted a new model. This is what we have now: a Rails team and a Core Services team.
00:11:39.680
The Rails team is predominantly responsible for developments in the Rails app while the Core Services team handles service-oriented tasks. All codebases at Yammer are open, so being on the Rails team doesn't limit you to writing Ruby code.
00:11:56.320
While the Rails team has to understand the monolithic Rails application, moving elements into services helps reduce dependencies on that knowledge.
00:12:07.680
We also introduced the idea of cross-functional teams. When we begin developing a new service or feature, we assemble a cross-functional team with representatives from all relevant departments.
00:12:23.760
These groups usually comprise two to ten members from various functional teams, who work on a project together for two to ten weeks. This approach ensures that team members constantly work with new people on diverse projects.
00:12:40.800
This method resonates with Sarah's point about diversity in project teams. Although we don’t do consulting work, our domain remains dynamic, and we face numerous problems simultaneously.
00:12:56.960
From infrastructure projects to developer tools and core product features, we also handle backend tasks related to tech debt and service extraction.
00:13:10.880
Our analytics team builds tools for the data pipeline, and these tools and features sometimes circle back to product engineering.
00:13:26.240
While we have functional teams, we essentially have a pool of engineers who can work on a wide range of challenges.
00:13:39.120
An example of a cross-functional team would include two Rails engineers, a core services engineer, a mobile client engineer, and various other contributors, depending on the project needs.
00:13:52.320
This setup promotes a decentralized design process with autonomy, leading to well-crafted, isolated, and reusable systems.
00:14:05.120
These teams are ephemeral; they disband after completing their project and transition to new endeavors.
00:14:19.680
Another interesting feature is that any one of the team members can serve as the tech lead for a specific project.
00:14:31.200
Rather than a perpetual manager overseeing these teams, leadership roles shift with every project, allowing individuals to contribute code and gain diverse experience.
00:14:49.760
Once teams are assembled, they delve into their designated domains, leveraging distributed execution to accomplish their tasks. They coordinate on the API agreements between services and clients.
00:15:03.040
As a result, we regularly form service-oriented collaborations when these cross-functional teams convene.
00:15:20.160
It's important to note that while this model offers advantages, it also presents trade-offs. We have encountered some challenges but have generally managed them well.
00:15:39.920
One potential drawback of not having specialized experts across the application might seem negative, although some might argue it provides flexibility.
00:15:55.520
However, there are costs associated with having teams constantly learning new domains.
00:16:12.320
Another emerging issue we’re discussing more frequently is the risk of tightly coupling client API implementations, as our mobile clients demand increased customization of data.
00:16:27.760
It's crucial we remain vigilant regarding feature-oriented project designs to ensure we keep our systems decoupled.
00:16:42.080
Following project completion, we still need to support those features. Once teams disband, shipped products may encounter bugs, which we need to address.
00:16:59.520
We counter this through the formation of support engineering cross-functional teams, which regroup for two to ten weeks to tackle these as-needed issues.
00:17:14.720
This process can be a bit more challenging due to the unfamiliarity with the domain and codes written by the original teams. But it's a trade-off we accept.
00:17:29.280
There isn’t just one method to pursue this strategy; it may look different based on specific needs and circumstances.
00:17:44.320
It may still resemble the structure of a Rails project for the foreseeable future. We modified our organizational structure but continue to grapple with challenging problems.
00:18:01.680
A common temporary solution is routing traffic through Rails instead of allowing direct communication with services. This yields some practical benefits, but it also forces us to rely on Rails resources to access service functionality.
00:18:18.080
Eventually, we aim to permit browsers to communicate directly with services, but when pursuing this route, we must also consider issues like authentication.
00:18:34.320
Problems can arise when we try to implement this model, and navigating these challenges appropriately becomes critical.
00:18:51.680
Users often look for efficient authentication solutions when they wish to bypass Rails to access data stored behind it.
00:19:06.960
Ensuring seamless connection with databases is essential, although writing to those databases can create complications.
00:19:23.120
Active Record is a powerful tool with many conveniences, but we face complexities when we try to extract data.
00:19:30.080
Detangling data from services comes with its own unique obstacles, especially when we deal with callbacks, validations, and state machines.
00:19:46.720
We often find ourselves making decisions that involve consulting Active Record to access the original data.
00:20:01.120
One potential route is to minimize reliance on Active Record altogether. Some companies take this approach; however, at Yammer, we find it incredibly useful.
00:20:18.560
Often, we use our services as indexes, stored with IDs or relationships.
00:20:37.680
In practice, that means our services build the index structure, although we still need to fetch original data from Rails. Rails management is necessary to successfully retrieve and handle that data.
00:20:52.640
Recently, we've considered shifting more ownership of data to services, referring to these as "bodega services," where you rely on a focal point for certain types of data.
00:21:10.960
We aim to streamline operations by trying to ensure that these services perform as quickly as accessing data from memcache. This goal is inherently challenging.
00:21:30.240
When we talk about moving data, we often find ourselves duplicating information, which can present logistical challenges.
00:21:46.640
Chances are, if you're like us, you can't afford downtime when moving data. If a service doesn't perform as expected, we face difficulty in rolling back changes.
00:22:06.080
That's where having a backup plan becomes critical. We pivot towards 'double dispatching' to backfill all the data to services.
00:22:24.720
We simultaneously write to the database while posting data concurrently to the service. During this process, we monitor and profile the service's performance.
00:22:41.440
This allows us to incrementally move traffic to the new service while ensuring relevant safety measures are in place. We have an exit strategy available if necessary.
00:22:59.040
We will be introducing a significant amount of new data input to the service, which often can happen at a much faster rate than normal service operations.
00:23:15.200
This forward-thinking approach enables us to build capacity and anticipate how services will respond.
00:23:30.080
Once we initiate this process, we encounter another challenge: duplicated data.
00:23:45.440
We need to address how to manage and clean up this duplication swiftly to avoid confusing our developers.
00:24:01.440
At times, we fall short in cleaning up data promptly, leading to confusion—not an ideal situation.
00:24:15.760
Fundamentally, we must accept that we need to be willing to fully commit, which can be hard, as comfort with the old way might tempt us.
00:24:29.920
Staying in your comfort zone is not always an option; there needs to be readiness to confront new problems.
00:24:43.680
Understandably, you'd want to have a solid backup plan. Still, at some point, you need to choose to make the transition. The cost of maintaining a temporary strategy can drain resources.
00:25:00.160
Leaving your comfort zone and embracing the complexity of new systems is imperative.
00:25:14.560
For developers to succeed, they need a solid story regarding their development environment. If your environment is cumbersome, developers are likely to revert to comfort zones.
00:25:36.960
So, my big recommendation is to use Vagrant. It’s been a great asset in structuring our development environment.
00:25:50.640
We strive to keep the environment as similar to production as possible, running Ubuntu on both ends. This replica allows us to operate all services locally.
00:26:04.400
However, as our number of services grows, we are encountering issues due to resource limits, especially since our laptops typically support only 16GB of RAM.
00:26:18.160
We also need to keep up with rapidly changing services. Developers have to stay updated on these matters.
00:26:36.960
We built a tool, running inside Vagrant, called Soup Kitchen. It’s designed to help developers update services efficiently.
00:26:53.760
It allows us to manage service updates seamlessly, so the main focus remains on development quality rather than constant maintenance.
00:27:05.360
Furthermore, we have to consider the deployment of these services. It's crucial that we have a process in place for adding new services effectively.
00:27:20.400
We need a system that manages efficient deployments and provides stable, pre-released packages supported by ongoing development.
00:27:36.000
As we manage numerous applications, it’s essential to ensure every engineer can deploy their services effortlessly.
00:27:52.240
We created a one-click deployment tool named Diplomacy, which unfortunately is not yet open-sourced but allows engineers to add new services without complicated processes.
00:28:10.080
The other aspect we need to be acutely aware of is monitoring and alerting systems for our increasing number of services.
00:28:24.720
Planning capacity and monitoring service performance is vital for ensuring smooth operations.
00:28:39.200
We utilize several monitoring tools, including New Relic, alongside in-house tools to track performance.
00:28:56.960
With that said, you also need standardized tools across all services, which streamlines response formats, data protocols, monitoring interfaces, deployment strategies, and dependency management.
00:29:15.680
This approach eliminates the notion of unique or special cases that complicate service interaction.
00:29:34.480
At Yammer, we have adopted a toolkit known as Dropwizard. This tool, maintained by Coda Hale, helps us package necessary Java libraries for service development.
00:29:52.680
Dropwizard offers an efficient setup for a production-ready service with built-in monitoring, alerting, and metric reporting functionalities.
00:30:13.920
While we heavily lean on Java, this doesn't rule out the possibility of utilizing Ruby for building services. There are scenarios in which extracting services into Ruby first and refactoring existing code is a more valid approach.
00:30:30.720
Even if we identify a need for more performance later, we can always shift to another language as the situation dictates.
00:30:48.320
Service-oriented architectures come with trade-offs. They offer abundant benefits, yet also present complex new considerations that must be navigated meticulously.
00:31:05.440
For example, complex systems are inherently prone to failure. A robust strategy for managing service unavailability should be developed as part of the response plan.
00:31:23.440
You may face problems detecting issues within multiple service levels, as alerts could originate from locations that don't reflect where the actual failure occurred.
00:31:39.600
As you adopt service-oriented architectures, be aware that transactions aren't free. These additional services introduce new complexities that must be managed.
00:31:56.320
Adjustments may be required to streamline API updates, and coordinated service deployments must be effectively supported across multiple client versions.
00:32:11.360
To recap, it's vital to continuously assess the costs and benefits of your decisions regarding service architecture. Are you still aligned with the trade-offs you initially accepted?
00:32:28.320
Being aware of your organizational structure is important—it’s beneficial to build a culture that supports service development and flow.
00:32:45.200
As your systems grow more complex, staying efficient in service deployment becomes essential, especially as you strive to meet customer needs.
00:33:01.680
It's important not to allow difficulties in building services to hinder your progress. When pressure mounts, you might be tempted to revert to earlier, monolithic structures.
00:33:15.680
Continuing to innovate in service design is crucial, and many lessons await on this journey. In the same way that you are allowed to acknowledge when you are wrong, embracing the learning process comes into play.
00:33:31.520
Each time we rewrite elements of our codebase, we discover new facets of effective service-building. Don't assume past decisions will remain valid, as adjustments in circumstances and information can change.
00:33:49.680
With that said, my name is Brian, I work at Yammer, and I appreciate your attention today.
00:34:05.360
However, just to clarify, that wasn’t all I had to share! After preparing my presentation, some questions arose.
00:34:15.920
I thought it best to address them at the end of my talk since they interrupted my presentation flow.
00:34:26.960
One question that often comes up is: "What should I extract into a service?" My general answer is that it truly depends on your application.
00:34:42.720
However, we have identified some successes with less state-dependent features. These services are easier to extract when they haven't become tightly integrated with your Rails application.
00:34:59.680
When developing new services, it's always easier to build components that don’t yet exist.
00:35:16.960
However, bear in mind that not every feature should be extracted as a service. At Yammer, we utilize A/B testing frequently.
00:35:34.240
After confronting some performance issues, we considered building a service around our experiment framework, but ultimately decided otherwise.
00:35:50.560
The data we require resides more closely within our Rails app, compelling us to recalibrate our focus on existing solutions.
00:36:08.800
It may be tempting to start anew, but often enhancements within the framework you currently have can lead to better outcomes.
00:36:24.640
Even after extracting something into a service, it’s critical to remember that you're not absolved from traditional development challenges.
00:36:37.680
Service extraction requires continuing diligence in terms of technical debt and evolving requirements.
00:36:53.760
Thus, adapt adeptly to changing conditions, ensure the reliability of services, and retain awareness that comfort with existing solutions might invoke reevaluation.
00:37:09.440
Now that I may have points worth discussing, let's return to the last slide for closure.
00:37:23.360
The idea is to be prepared to embrace mistakes. Acknowledgment and recovery are vital in ensuring you remain resilient as a service builder.
00:37:45.600
Thank you all for your time!