00:00:00.900
Welcome everyone.
00:00:12.900
Thank you for coming to my talk called "Laying the Cultural and Technical Foundation for Big Rails."
00:00:15.360
To start off, I want to take a moment to thank everyone who's contributed to and maintained Rails. They have created something bigger than themselves and inspired people and businesses of all sizes. Thank you to them.
00:00:21.119
I also want to thank everyone who helped put together this conference and brought us here today. A little about me: my name is Alex, and I'm really happy to be here today to speak with you all. I appreciate you taking the time to attend. I live in Vermont with my partner, and I enjoy many things, including gardening. This is a picture of our work-in-progress garden. I also love eating fresh fruits and vegetables, composting, and other crunchy things.
00:00:43.579
I have worked at Gusto for nearly six years now, primarily on the product infrastructure team, focusing mostly on back-end modularization. Before that, I worked on the benefits team. I truly love working at Gusto, and I’ll be available throughout the conference and beyond if you want to learn more.
00:01:38.820
To dive in, this is Gusto's system graph. Each of the black rectangles you see represents a subsystem within Gusto's large Rails monolith, and the red arrows indicate when one subsystem communicates with another. Over time, despite our team growing at a linear or even exponential rate, we noticed that our per-contributor velocity was actually decreasing.
00:01:50.620
It became harder to add new features, and we frequently found that implementing a new feature made adding subsequent features more challenging rather than easier. We realized that people were struggling with making large-scale changes in our codebase because large-scale changes require structure at scale, which we were lacking.
00:02:01.740
So, how do we even begin to solve this problem? We tried a couple of different approaches. We created gems and Rails engines, but in the best cases, we found that our business logic was so entangled that the most we could manage was to extract small, mostly functional components that lacked significant business value and wouldn’t make a dent in the overall complexity of our application.
00:02:29.340
In the worst cases, it led us on a bit of a wild goose chase. We recognized that we had many years of work ahead if we wanted to extract all of our code into gems and engines. We also attempted to implement microservices for new and existing services, which had varying degrees of success.
00:02:40.920
However, none of these techniques could adequately address the challenges posed by our ever-growing Rails monolith. This is the story of how Gusto's complexity grew and brought me here today to share that journey with you. In this talk, I will discuss some of the progress we've made and share tools and techniques that have been helpful.
00:03:43.740
As I share this story, keep in mind that I've taken some creative liberties in my storytelling to hopefully help you avoid making some of the mistakes we made along the way. But I’d be happy to share some of those mistakes if you're curious.
00:04:42.000
Gusto first began as a payroll provider for small businesses in California. Over the years, we expanded both geographically and in the problems we solve. Today, we assist companies in running payroll, ensuring tax compliance, setting up health insurance benefits, and taking care of their employees, among other tasks.
00:05:03.000
Gusto experiences significantly lower web traffic and data storage requirements compared to many other similar companies. If we do our job right, users spend very little time on our website, running payroll or figuring out how to set up their health insurance benefits. Personally, I believe the less time users spend on our site and the less information we hold about our customers, the better.
00:05:32.000
Thus, this story is not focused on performance, big data, or web traffic throughput, but rather on scaling domain complexity. If this narrative feels familiar to you, you may also be working in a large Rails application. We want to change something about this narrative—and we can.
00:06:03.000
Regarding 'big Rails,' many people have different definitions. One definition is that big Rails is a system of socio-technical tools, practices, and conventions that facilitate scaling Rails development in terms of lifespan, the number of contributors, and complexity. Among these, five key principles have guided a lot of our approach to 'big Rails': accountability and ownership, clear boundaries, thoughtful dependency management, gradual adoption, and intentionally curated sustainable feedback loops.
00:06:21.840
Accountability and ownership are crucial concepts. Imagine a piece of code in your codebase. When that code is called, who receives the error? How do you route the error to the right team? Who do you consult regarding a change to this code when 'git blame' shows dozens of contributors, many of whom may no longer be with the company? Accountability and ownership are about clarifying what team is responsible for whom, as well as the behavioral and structural concerns of the codebase.
00:07:06.060
We aim to make it straightforward to identify ownership, both for human operators working within the codebase and through automated tooling. For instance, an engineer should be able to identify what team owns a part of the domain easily. Bug logs, monitoring, asynchronous processes, and more should all be attributable to a specific team.
00:07:38.160
To accomplish this, we created two gems, named 'code ownership' and 'teams.' We open-sourced these in advance of this talk, along with several other tools that I'll be discussing. Look out for the Ruby gems logo in the bottom right for references to these open-source gems. You can also find all the references I discuss in the Gusto Engineering blog.
00:08:05.040
On the left, you can see a basic team decoration. On the right, it demonstrates how code ownership can link a piece of code back to the team responsible for it. With this, we could tie our codebase to specific teams and individuals. Our error monitoring system assigns bugs to individual teams, allowing them to take responsibility for their impact on system behavior. They can also define their own service level agreements (SLAs) and tolerance for errors based on their business objectives.
00:08:45.200
We configured our asynchronous work system, Sidekiq, to tag jobs with owner information, so monitoring dashboards could be filtered by team. We also use this setup to generate our GitHub code owners file programmatically, ensuring teams received notifications for code reviews.
00:09:13.060
As a bonus, this sparked conversations about code ownership—or the lack thereof—throughout our codebase, both for new and existing code. Consequently, this pushed us towards the necessity of creating boundaries, as it's difficult to establish boundaries without accountability and ownership.
00:09:44.000
Clear boundaries involve working towards easily understandable conceptual and mechanical separations between domains. Each system should only ever communicate with other systems through intentionally maintained public APIs. Thoughtful dependency management seeks to minimize dependencies to reduce cognitive load when understanding a system. When we must add a dependency, we should do so explicitly.
00:10:36.240
We should avoid creating cycles in our dependency graph, as these reduce our ability to understand a subsystem in isolation. To move towards this goal, we started with one simple change to a standard Rails convention. In the standard Rails app, you have an app directory and secondary directories for architectural concerns, like models, views, controller services, followed by directories for business domains.
00:11:38.760
Earlier, I mentioned I was on the benefits team for an extended period. This meant that as a product engineer on that team, I had to navigate across folders to work within my team’s codebase. This was not only cumbersome, but it violated an important principle of coupling and cohesion, which states that things that change together should reside together.
00:12:23.340
What we really wanted was all the benefits-related files to be in the same folder. So that's precisely what we did. We felt it was far less meaningful to organize the app by architectural layer and much more meaningful to organize it by domain.
00:12:56.640
Note that we didn’t divide it by team, as our software shouldn’t reflect our organizational structure. This change didn’t require a significant amount of additional technology initially; we just had to set up some load paths, which we simplified by open-sourcing a gem called 'stimpak.' Once added to your gem file, it sets up Rails load paths and more for this new structure.
00:13:35.160
Please keep in mind that this switch is well supported by Rails in case you want to configure it yourself. In fact, it's quite similar to Rails engines, but we chose not to utilize Rails engines for several important reasons I will address later.
00:14:14.460
This pattern also came with numerous additional benefits, such as having test files co-located alongside the code they test. The technologies used for a domain became more of an implementation detail hidden from consumers. I don't mean to imply that Rails' default way of organizing by architectural layer is incorrect.
00:15:00.000
In fact, most new Rails projects tend to focus on a single domain, so the left-side pattern makes a lot of sense for those applications.
00:15:54.600
Now that we have our app divided by domain, the next question becomes how to systematically manage the relationships between those domains. This is where Packwork comes into play, and I want to thank Shopify for this wonderful tool, which has provided us with numerous opportunities.
00:16:30.520
With Packwork, we maintained the structure we had before, organizing first by domain and then introducing a package.yaml file that you can see there on the right. Note that Packwork isn't concerned with how your file system is organized, but we found it incredibly beneficial when our system was organized by domain first.
00:17:37.340
We also added owner fields to each package.yaml, which is a custom field that Gusto added and is used by the code ownership gem. Next, we incorporated public folders into each package to house our public API while keeping everything else private.
00:18:03.860
Your basic building block for Packwork is called a package. At Gusto, we often refer to this as a pack, mainly because it's quicker to say and write. A pack is simply any folder of code that has a package.yaml at its root, which are the nodes in this graph.
00:18:55.680
Each node in the graph has an inner concentric circle representing the private API, while the outer ring shows the public boundary that a pack exposes to the outside world. Packwork provides several ways to declare something as public or private, but at Gusto, we prefer to have a public folder because we like the idea of everything being private by default.
00:19:57.240
I've added two other systems, so now we have three packs: benefits, HR, and payroll. In each pack's package.yaml, there are lists detailing dependencies on the other packs, represented by these large white arrows.
00:20:00.840
Packwork requires that these explicitly stated dependencies never form a cycle, which is one of my favorite features of Packwork. In this small system, the HR pack depends on both the benefits and payroll packs, while the benefits pack depends on payroll.
00:20:51.000
Next, Packwork will parse every Ruby ERB and Rake file using the same parser that RuboCop employs. After parsing, it retains a list of every constant, class, or module referenced within the code of that pack. These references are denoted here as purple squares.
00:21:33.720
It's essential to note that Packwork relies on static analysis of the codebase to operate. This means Packwork isn't required in production nor is it loaded during runtime. Additionally, implicit references to constants, classes, or modules may be obscured from Packwork's view.
00:22:16.260
At Gusto, we like using Ruby's static type checker, Sorbet, to give Packwork more to work with. Once Packwork has parsed the files, it draws edges between any reference to a class, constant, or module (the purple squares) and its definition (shown as an orange circle). Packwork knows where something is defined because Zeitwerk, Rails' autoloader, establishes a convention that Packwork relies upon.
00:23:18.520
Please note that these arrows may not respect the public API of the other package. Similarly, a pack can use a class from another package without explicitly stating a dependency. Packwork represents these deviations from the intended API use and dependencies as dependency and privacy violations.
00:24:22.640
So let's discuss these dotted red arrows. The solid green arrow at the bottom right indicates that Packwork doesn’t detect a problem because the HR pack is referencing the public API from the payroll pack and has declared a dependency on payroll.
00:25:28.180
On the other hand, the topmost violation indicates a reference from benefits to HR, which constitutes both a privacy and a dependency violation—hence the double line. This is a privacy violation due to the benefits pack referencing a class constant or module from the HR pack that is private, meaning it doesn't exist in the HR pack's public folder.
00:26:28.900
Furthermore, this is a dependency violation because the benefits pack is utilizing HR without having declared HR as a dependency. These dependency violations may even happen if benefits were using the public API. The importance of these clear boundaries and thoughtful dependency management is crucial for managing a large Rails application, and Packwork makes these principles much easier to follow.
00:27:22.260
The code snippet on the lower left shows example code relating to the Packwork graph. In this case, the HR helper is part of the HR pack's private API, and benefits does not declare a dependency on it.
00:27:50.960
Packwork constructs this graph and outputs these red dotted arrows as essentially a to-do list of violations, as shown in the bottom right corner. I cannot emphasize how impactful this to-do list is for allowing us to incrementally improve our system by identifying areas where we need to reinforce boundaries and track our progress.
00:28:32.780
But what about gems and engines? It’s important to note that this workflow is quite distinct from using inline gems and engines to modularize applications. When comparing the two, Packwork supports gradual modularity.
00:29:09.160
Extracting large areas of existing systems as a gem or engine often forces you to confront modularization issues in areas that might offer low business value. In contrast, Packwork allows for aspirational gradual modularity by enabling you to state an ideal system diagram and giving you a to-do list to guide you toward that goal.
00:30:11.460
In other words, Packwork decouples statements regarding system structure and boundaries from the implementation of those boundaries. Additionally, it makes altering boundaries inexpensive and easy because creating packs and moving files between them does not have to affect your runtime at all. This makes it particularly advantageous for Greenfield projects, where you can learn more about domain boundaries along the way.
00:31:10.620
As we address inline gems and engines, distribution or versioning aren’t necessary, but as of now, Packwork does not support those features. Test speeds are relatively comparable when using Spring and Boot Snap. Gems are advantageous because they support strict boundaries, as a package with no violations can easily become a gem.
00:32:01.100
To that end, we've released another gem called Package Protections to ensure that package boundaries remain as clean as a gem or engine. Lastly, Stimpak allows packages to incorporate some engine features with less boilerplate code.
00:34:02.420
Overall, we continue to strongly believe that gems and engines are—and will remain—a critical component in the modularization toolchain. We’ve also identified areas where, although certain parts of the system could be gems, at Gusto, we're perfectly fine if they choose to remain as packages indefinitely.
00:35:18.420
I could spend a lot of time comparing and contrasting gems and engines with Packwork packages, but I’ll move ahead for now. If you have further questions, please let me know after the talk.
00:35:45.860
A system never starts off as a big Rails application; it grows into one organically. Practices that have contributed to success at one scale may lead to confusion and challenges at another scale. Similarly, tools designed for large Rails applications may not suit small Rails apps.
00:36:32.560
Therefore, big Rails tools must be adoptable in gradual increments. Scaling a large Rails app includes technical components but cannot be achieved without addressing behavioral and cultural elements. We can't just turn on these technologies and anticipate enhanced growth.
00:37:12.120
This transformation requires substantial evangelizing and educating at every step. Throughout this transformation, I worked with teams to ensure we achieved the desired outcomes.
00:38:09.020
My first goal was to get every team to turn on privacy and dependency enforcement, meaning that the Continuous Integration (CI) system would fail if Packwork detected a system boundary violation. To do this, I needed to identify whom to engage with and reached out to individuals with high context, encouraging them to add team owners to packages.
00:39:15.420
Over the course of about a year, nearly all our packages now have a single team owner. In some instances, achieving this required developers to reorganize their code to decouple domain areas controlled by different teams.
00:40:00.460
I met with the teams that owned the packages and delivered essentially the same message I'm sharing with you now about why I believe this process is crucial. One by one, teams committed to establishing a boundary between their public and private APIs and explicitly managing their dependencies.
00:40:55.640
This seemed to turn out very well, and you can see the standard S-shaped technology adoption curve starting in October when we began collecting this data. As expected, teams adopted these tools at different rates and with varying levels of enthusiasm, but there is general consensus about the value of this approach.
00:41:25.840
However, I often found that teams were simply updating the violation to-do lists as if that was the singular correct way to resolve a failing Packwork build step. Every time there was a RuboCop error, a user would just add to the RuboCop to-do list.
00:42:25.420
While linting is relatively straightforward for people, Packwork requires us to create public APIs and be intentional about our dependencies, making it much harder to effectively use this tool compared to a linter. This highlights the need to rethink our feedback loops regarding a large Rails system, as well as any software system.
00:43:50.420
Here are some events we care about where we want to enhance our feedback loops. Note that we could potentially add countless more events to this list, but these are just a few examples of events that consistently occur, and for which we already have a platform to build upon.
00:45:01.440
Regarding the challenge I faced, I noticed that often users merely executed the command to update the Packwork to-do list. To understand why users were doing this, I needed to dig deeper. This involved doing something non-scalable to figure out how to make it scalable.
00:46:21.680
To achieve this, I set up a Slack integration to alert me whenever there was a new privacy or dependency violation. I would comment on the pull requests (PRs) asking users to share more info. I created a spreadsheet to track each PR where I left comments, and over the course of about a year, covering approximately a thousand pull requests, I learned a great deal.
00:48:21.680
I used the spreadsheet to generate histograms showing why users updated the to-do list, which helped improve our documentation and foster a better understanding of how to support these developers.
00:49:35.420
More significantly, I frequently met with developers over Zoom to share insights about what we were doing and why. Through this process, developers began to become more familiar with the system, leading to a cultural shift.
00:50:11.400
Developers recognized that there were reasons for our efforts and that real people cared about how they interacted with these messages and used these tools. Over time, I noticed that developers started to proactively add context about these violations, often addressing the system design concerns before I even had a chance to comment.
00:51:11.640
To enhance the efficacy of these feedback loops, we developed some tooling. Early user feedback indicated that developers desired a quicker feedback loop, prompting us to create a VS Code extension for Packwork and to incorporate the CI check as a configurable Git commit hook.
00:52:11.640
We also established systems to automatically leave helpful inline comments on PRs when Packwork detects a privacy or dependency violation, as well as instances when the developer updates the to-do list.
00:53:37.840
This graph shows the average number of Packwork violations for each file over the past year or so. Overall, we observe a clear downward trend. Regarding the system graph from earlier, while there remains substantial work ahead, we continue to make progress.
00:54:28.000
What’s next? Just as Rails itself is a product of an engaged and passionate community, I hope we can approach 'big Rails' applications similarly. Many organizations are currently or will soon face the challenges of managing significant domain complexity.
00:55:50.020
I am immensely grateful for all the contributions from individuals and companies that have played a part in the solution. Some questions I have for the community are: In what ways can Ruby and Rails continue to provide excellent tools and cultural norms that help users create well-modularized systems?
00:56:52.380
What can be learned from the different conventions of Packwork packages, gem specifications, and other packaging systems? All these tools, including those developed by the broader community, are imperfect.
00:57:30.540
If you have an interest in this problem space, I invite you to join us at Gusto or any of the other communities. Everyone is welcome.
00:58:01.420
You can also catch me right here after the talk to discuss this material or to simply reach out via email. Your feedback would be greatly appreciated. Whether you try out the tools and share your experiences, leave comments, open pull requests, or experiment with diverse approaches, everything is welcomed.
00:58:43.000
I hope I've been able to convey some of the value we've extracted from these tools and strategies, and I’m excited to collaborate on solving some of these challenges. Thank you.