Opening Keynote: The Myth of the Modular Monolith

Eileen M. Uchitelle

Opening Keynote: The Myth of the Modular Monolith

Eileen M. Uchitelle • September 26, 2024 • Toronto, Canada

The video features Eileen Uchitelle's keynote titled "The Myth of the Modular Monolith," delivered at Rails World 2024. In her address, she discusses the evolution of Rails applications as organizations grow and the dilemma they face between maintaining a monolith or transitioning to microservices. Shopify, where Eileen works, opted to modularize their monolith over six years ago, but it has led to reflection on whether this change truly resolved the problems at hand.

Key points discussed include:
- Challenges of Growing Rails Applications: As applications expand, they face issues such as lack of organization, slow CI, and difficulty onboarding new developers. These challenges often lead organizations to consider microservices as a potential solution.
- What is a Modular Monolith?: Eileen explains that a modular monolith organizes a codebase into modules to alleviate the complexities of large applications while maintaining the deployment benefits of a monolith.
- Myth of Simplification: She emphasizes that moving to a modular monolith does not inherently fix architectural, operational, or organizational problems. The solutions to these issues often stem from cultural and human challenges rather than technical architecture.
- Common Problems and Their Source: It categorizes problems into architectural, operational, and organizational issues. Organizational challenges are often exacerbated by increasing scale and growth, highlighting a need for improved structure in large applications.
- Misaligned Incentives: Uchitelle discusses how companies tend to prioritize shipping features over maintaining code quality, leading to technical debt that eventually makes applications unwieldy.
- Recommendations for Improvement: She stresses the need for better education and indoctrination for developers to embrace Rails conventions and cultivate a positive engineering culture that values quality work.

Ultimately, the conclusion presented throughout the keynote emphasizes that modularization should not be seen as a silver bullet but rather an approach that requires diligent work on human and cultural problems to truly improve application architecture and developer satisfaction.

Opening Keynote: The Myth of the Modular Monolith
Eileen M. Uchitelle • September 26, 2024 • Toronto, Canada

As Rails applications grow over time, organizations ask themselves: 'What’s next? Should we stay the course with a monolith or migrate to microservices?' At @Shopify they chose to modularize their monolith, but after 6 years they are asking: 'Did we fix what we set out to fix? Is this better than before?' Join Rails Core member Eileen Uchitelle as she poses these questions during her #RailsWorld Day 2 Opening Keynote.

#RubyonRails #Rails #Rails8 #monolith #microservices #modularization #scaling

Thank you Shopify for sponsoring the editing and post-production of these videos. Check out insights from the Engineering team at: https://shopify.engineering/

Stay tuned: all 2024 Rails World videos will be subtitled in Japanese and Brazilian Portuguese soon thanks to our sponsor Happy Scribe, a transcription service built on Rails. https://www.happyscribe.com/

Rails World 2024

00:00:11.719 Thank you to the strangers and my friends who were in the elevator with me, who talked me off a ledge and then figured out how to get me up the stairs so I didn't have to take the elevator again. It was working fine this morning, though, so we're good. I'm Eileen Uchitelle, and I'm honored to be your Day 2 keynote.

00:00:33.320 It's really exciting to be here, and I'm so happy that you all could make it to Rails World. Thank you to the Rails Foundation, Amanda, and everyone who worked hard to put this conference on today. It's really lovely to see all of you here. I'm a senior staff software engineer at Shopify on the Ruby and Rails infrastructure team. Toronto is one of our hubs, and we love Rails!

00:00:59.719 Please come by our booth to chat or learn more about what roles we’re currently hiring for. I've been a member of the Rails core team since 2017. The Rails core team is the driving force behind the framework—we make decisions about the direction and evolution of Rails, as well as collaborate with contributors and the community.

00:01:15.840 Being on the core team has been the highlight of my career; it's enabled me to have a deep and lasting effect on the framework. I've been building Rails applications for about 14 years now, and throughout my career, I've seen a lot of different types of Rails applications. I've worked at companies with less than 10 employees and companies with over 11,000.

00:01:40.159 I’ve worked on Rails applications that were brand new, and others that have been around since the dawn of the framework. I've seen applications by developers just learning Rails, and I've seen ones written by DHH himself. Throughout my career, I've spent extensive time working on two of the earliest Rails applications in existence: Basecamp Classic and Shopify.

00:02:09.640 While Basecamp Classic is still running in production, it only gets security updates, and it's actually still running on Rails 3. This makes Shopify's core monolith the oldest continuously developed production application on the planet. It was built on a version of Rails that wasn't yet released to the public, and now runs off Rails main.

00:02:33.720 While the application has changed a lot in the last 20 years, it’s still fun to look back and see Toby's first commit and see just how excited he was to be using Rails.

00:02:45.480 In between 37 Signals and Shopify, I worked at GitHub for five years and spent the first two and a half years there upgrading the main monolith from Rails 3.2 to running off Rails Edge. I think I'm the only person who's worked at all three of these companies and seen these three early Rails applications.

00:03:10.319 One thing that stood out over the years is that eventually, Rails applications reach a state where the framework stops bringing developers joy and happiness. As organizations grow, applications tend to become a ball of mud. The code lacks organization and structure, onboarding is difficult and painful, CI is slow and flaky, and simple small changes seem to hit endless amounts of friction in development.

00:03:53.920 At this point, many companies ask themselves, 'What now?' They feel that Rails is no longer meeting their needs from an architectural perspective. Often organizations will start looking into microservices in order to get the experience of building a Greenfield application and move away from the monolith.

00:04:07.879 There was a point at GitHub where all anyone could talk about was microservices saving us from ourselves. About six years ago, Shopify began exploring modularizing our core monolith. It seemed like the best of both worlds—the hope was that we could feel like we're working on a smaller application without the network latency and organizational politics of microservices. It could solve all of our engineering problems, and we get to keep writing Rails. What could be better than that?

00:04:38.479 Right before I left in 2012, GitHub started modularizing their monolith as well, realizing that turning the entire monolith into microservices wasn't a tenable goal. Gusto, Zendesk, Doximity, and other companies with large monoliths have also adopted this architecture. And while I hear a lot of voices pushing for this new pattern in Rails applications, it’s proven to not be the silver bullet that we’d hoped for.

00:05:03.960 After all these years, if we look back at the problems that we were trying to solve, we'll see that they are still ever-present, and new challenges have arisen from our effort. The myth of the modular monolith is that it promises a greener pasture, better structure, and less coupling, but what we get instead is a new set of challenges and unmet goals.

00:05:34.840 On the surface, our problems appear to be technical, but if we look deeper, we’ll see that they are actually human and cultural challenges. You can't solve human and cultural problems by changing your architecture. Today, we'll examine the difficulties that companies face as their applications scale. We'll explore why modular monoliths seem like an ideal solution and uncover the underlying causes of the issues we’re trying to address.

00:06:00.000 We’ll discuss how to tackle human problems while avoiding the pitfalls of chasing false promises and silver bullets. First, let's talk about the very real pain points that led companies like Shopify, GitHub, Gusto, and others to modularize their Rails applications. Many companies that are expanding, hiring, and growing will face these same challenges over time.

00:06:54.959 I've categorized our common problems into three types: architectural, operational, and organizational. Architectural issues refer to challenges related to the overall design and structure of a software system, affecting the maintainability and scalability of an application. Architectural problems creep in over time as more and more features are added to an application and refactoring isn’t prioritized.

00:07:28.519 A common architectural problem that occurs in growing applications is lack of organization and structure. Rails is an MVC framework, so all the models go in app/models, controllers in app/controllers, and views in app/views, and so on. As the number of those models and controllers and views increases, Rails' default directory structure can become unwieldy if you're not careful.

00:07:51.440 The lack of organization and structure means that developers rarely consider where new code should go or when existing code should be refactored to live elsewhere. Everything just goes into the default folders, making it difficult to discern what concepts belong together. This is often what developers mean when they talk about the 'ball of mud' as if it’s Rails' fault and not our own doing that code didn't follow proper design.

00:08:06.280 In addition to lack of organization and structure, another issue that growing applications have is tight coupling and lack of boundaries. It’s easy to see how this happens in a Rails application. Ruby is a language where code is globally accessible, so without good judgment, code can become tightly coupled. Changes that seem simple unintentionally impact other parts of the application in ways that are hard to predict and control.

00:08:40.080 When this happens, you either have to fix all the callers or refactor the codebase to separate the functionality. These kinds of side effects caused by tight coupling and lack of boundaries harm developer productivity. In addition to architectural problems, applications face operational issues—these types of problems refer to the practical aspects of running and maintaining applications.

00:09:03.200 They affect the overall happiness of your engineering organization. An example of an operational problem is flaky tests and slow CI. While any size application can have flaky tests, the larger your monolith is, the more likely flakes are to occur. The hard it is to narrow down where they started, flaky tests block deploys or make CI runs take longer due to rerunning failed tests.

00:09:51.160 A flaky build adds a lot of friction and frustration to development. Often, engineers working on large monoliths will complain that their CI suite takes too long. While the amount of tests certainly can contribute to longer CI runs, it’s often not the only cause. CI may be slower due to a test that creates too many records, queries that are ridiculously slow, or network calls that were unintentionally added.

00:10:05.960 There are a lot of things that contribute to a slow test suite, but they become more problematic as a monolith and organization grows faster than problems are addressed. In addition to slow CI, when a monolith gets to a certain size, it’s common to hear complaints that it’s difficult to scale deployments effectively.

00:10:24.040 The larger the application, the longer it will take to check out the code and restart the servers. As an application gets more popular, you need more infrastructure to handle customer traffic. Because Rails is a monolith, it’s not possible to deploy more servers for one resource-intensive part of the codebase.

00:10:51.560 Lastly, organizational issues are at the company leadership and structure level. They are related to how work is managed, how quickly problems get solved, and how well an engineering organization is functioning. Organizational problems can happen regardless of the size of your monolith or company; however, they're exacerbated by growth and scaling needs.

00:11:04.680 An example of an organizational problem is difficulty assigning and finding code owners. In a large application, at small companies like especially if you’ve got only five engineers, everybody’s responsible for the entire application. However, that doesn't scale as your company gets bigger and you have thousands of engineers on an all-call rotation for code they didn’t write or don’t understand.

00:11:31.760 When an application gets to a certain size, you need to split up the responsibility so the bugs get fixed, and you know who to page during an incident. Another organizational issue is the length of time onboarding new hires takes. The argument is that it's hard for new hires to get started shipping because they can’t find their way around a monolith. If that monolith was modular, they could, in theory, just focus on code that belongs to their team.

00:12:02.080 Coming into a giant monolith, especially if you haven't written Rails before, can be quite daunting. It makes sense that a smaller codebase would feel easier to reason about for new hires. Lastly, one argument I often hear for why large monoliths are problematic is that people always say, 'I can’t hold the whole application in my head.' When I hear this, the only thing I can think is, why are you even trying to do that? What I can hold in my head is different from what you can hold in yours and what DHH can hold in his.

00:12:53.920 One and Matthew for the little DHH head. I don’t agree this is a real problem, nor do I think it’s a useful metric for whether an application is a ball of mud. Realistically, as our companies and our applications become more successful, our monoliths are going to grow over time.

00:13:06.720 It's important that we focus more on having well-designed, properly structured applications that follow Rails conventions, than it is to pick an arbitrary number of lines of code that we can hold full context on. Having monoliths small enough to hold in your head isn’t a tenable or reasonable goal at scale.

00:13:22.799 The other problems we've discussed, though, are very real. I've experienced them at multiple companies I've worked at, and I've heard these problems from others that I’ve talked to. In order to solve this set of problems, companies often reach for microservices and try to carve up their monolith. Instead, Shopify, Gusto, GitHub, Doximity, Zendesk, and others are exploring the modular monolith.

00:13:53.719 A modular monolith is a monolith that is organized into modules inside the codebase by grouping logical domains into logical directories. The promise of a modular monolith is that, in theory, it provides the best of both worlds: the isolation of boundaries of microservices with the ease of deployment and development of a monolith.

00:14:23.720 There are a lot of reasons to choose a modular monolith over microservices. With a modular monolith, you get to use the same language. If an organization is moving towards microservices, you’re all now writing Go when Ruby is likely what you were hired to write.

00:14:44.240 This is not a knock on Go; it has its place. But as Rubyists, I want to write Ruby, so a modular monolith benefits me and other Rubyists in that we get to keep writing the language that we love. In addition to using the same language, a modular monolith means you don’t have to build out new infrastructure—deploys and CI just work the same way as they did before—which means the overhead for migrating to a modular monolith is a lot lower and quite minimal, at least to start.

00:15:20.000 Deployments and testing for microservices can be more complex if they’re interdependent, whereas a modular monolith is deployed and tested as a single unit. Another advantage of a modular monolith is that it provides a path towards package isolation. Isolation means creating boundaries between components and attempting to remove dependencies and coupling. In order to define boundaries between your modules, you need to use a tool like Packwerk and then have to do the work to remove dependency violations.

00:15:37.720 If you want to modularize your Rails application, and your goal is to be able to run separate CI jobs for your packages or deploy your packages to separate servers, isolation is going to be required. True isolation is incredibly difficult to achieve, and later we’ll take a closer look at why this might not be a path that you want to pursue.

00:16:14.640 Currently, I'm not actually aware of any applications that are deploying fully isolated modules of their monoliths to production. At Shopify, we have a single isolated component that can run CI in isolation, but it doesn’t contain any product functionality. It’s essentially our active support of our application, and therefore it’s not useful to deploy to its own infrastructure.

00:16:57.080 Most of the companies using modular monoliths have implemented them the same way. Generally speaking, in the Rails world, a modular monolith uses Rails engines to organize code and Packwerk to define and enforce boundaries between their packages. Rails engines are native to the framework, and Packwerk is a gem written by Shopify.

00:17:28.720 To understand how this works, let’s say we have an application called Fur and Foliage that sells both plants and pets. In a standard Rails application, the directory structure might look like this: if you have an application with four concepts—dog, cat, tree, and flower—you have corresponding models in the app/models directory, controllers in app/controllers, and so on.

00:17:59.000 If we were to modularize our Rails application with Rails engines, our application might be organized like this: here we have grouped dog and cat into a pets engine while tree and flower are grouped into a plants engine. We call these packages.

00:18:22.080 I've purposely oversimplified an example of how modular monoliths work at Shopify. We have top-level components and many nested packages inside, and other applications are doing that too. But I didn’t want to spend all my time making a fake application about plants and pets for you. There are also many blog posts and talks available online that talk about the decision-making process of how to figure out where stuff goes and how to modularize your Rails application.

00:18:49.720 So I'm not going to go deeper on that. This talk is meant to be a higher level view of the problems that we're trying to solve. A monolith that’s only modularized with Rails engines doesn’t inherently reduce dependency or create isolation because all the code is still accessible between packages.

00:19:14.240 We can use Packwerk to identify and enforce isolation between plants and pets. As mentioned earlier, Packwerk is written at Shopify and is what most applications are using to enforce boundaries. This talk is not an endorsement or criticism of Packwerk; I'm using it for examples because it’s the only tool that I’m aware of that actually does this.

00:19:38.600 Packwerk allows you to define dependencies between packages and enforce boundaries by not allowing undeclared dependencies. So with Packwerk, dependencies are defined in a YAML file. Every package has its own package.yaml that contains metadata for a package and the packages that it depends on.

00:20:02.720 In this example, plants depend on pets. This is considered an allowed dependency. This says it’s okay for the pets package to rely on constants in the plants package. There’s an enforce dependency setting that prevents adding any new dependencies when it’s set to strict.

00:20:29.320 In addition to the package.yaml file, there is a package.to.yaml file, which declares dependency violations that you want to work on removing. If you want your package to be isolated from all other code in the monolith, you would need to fix all the to-dos and remove the allowed dependencies.

00:20:58.640 Working through all these to-dos can be quite difficult in a large application. At Shopify, we have a little over 40 top-level constants defined, and most of those contain nested packages. In our core application, we have over 90 package to-do files, and there’s no concerted effort to burn those down. We’re not even tracking whether we’re actively reducing dependencies because it’s not a useful metric for us.

00:21:25.160 We’ve found that enforcing strict dependencies causes a lot of friction for developers. Packwerk is a tool that’s useful for knowing what most dependencies are between packages, but it can’t tell you how to write your code with less coupling or how to refactor existing violations to create boundaries.

00:21:49.960 Rather than think of Packwerk as a to-do list, think of it more as data for helping you understand your dependency graph and where changes may be needed. Because modularization and isolation tools can’t tell you how to write better code, it can cause new problems to crop up in our applications.

00:22:15.840 It’s not Packwerk’s fault by any means. Attempting to fully isolate and remove dependencies in a large-scale monolith is very difficult and can result in undesirable design patterns. One problem that I've seen in large monoliths trying to prevent new or reduce existing violations is primitive obsession.

00:22:33.560 An example of primitive obsession is passing IDs around rather than Active Record objects in order to avoid calling constants between packages. Often, developers will load all the records from the database, get the IDs, pass those to another package, and then load the records again using those IDs.

00:23:04.080 Because Packwerk doesn't see that as a violation, to remove this dependency, we need to avoid calling cat directly from the flower model. We can do that by making an explicit public interface called something like 'cat getter' and pass the ID to that.

00:23:34.840 Primitive obsession makes the code more difficult to follow and can lead to problems with the database due to unnecessarily complex queries, less efficient queries, and data duplication. If one package already loaded cat just to get the ID to pass it to flower, it means that the database is queried for the same data twice.

00:23:53.760 While using primitives might circumvent a dependency violation and reduce coupling, the downside is database performance issues that affect your customers and patterns that don’t utilize the efficiency of Active Record. If you prevent using abstractions that engineers are used to, like Active Record, for example, you end up with far worse patterns and issues in your application’s design and structure.

00:24:12.560 One of the benefits of a modular monolith is that there’s no network latency between packages, like there would be with microservices. However, primitive obsession leads to performance issues that isn’t exactly better. We have to be careful that when we put up guardrails, we don’t end up encouraging far worse anti-patterns.

00:24:37.240 In addition to primitive obsession, a challenge I’ve observed is ownership obsession. This refers to the mindset that code within a package is solely the domain of the owning team, leading to resistance against input or oversight from others.

00:25:06.080 While establishing clear boundaries between packages can promote modularity, it can also create silos that result in an 'us vs. them' mentality, when the codebase should be viewed as a shared responsibility. Ownership obsession results in a selfish mindset as well; teams will refuse to fix code in a package they don’t own because they don’t see it as their problem.

00:25:42.760 Often the right change is the harder change, but they’ll avoid touching someone else’s code at all costs, often to the detriment of code quality and simplicity. Drawing strict boundaries within an application can have a negative effect on collaboration. The idea that a package functioning like a smaller Rails application is appealing in theory; however, when this leads teams to prevent others from accessing their code or to defend poor design choices, it becomes a negative consequence of modularity.

00:26:10.880 This undermines collaboration and overall engineering culture. Another problem that’s crept up is developers being obsessed with putting everything in a new package. Every concept in its own package—teams often want to create yet another package because they feel like it doesn’t fit into the existing ones.

00:26:56.640 If you’re not careful about preventing everything from being a new package, you’ll end up with a codebase that’s modeled after your org chart. It’s important to constantly scrutinize whether a new package is really necessary. As humans, we love to categorize things, and we want everything to fit neatly into little boxes.

00:27:25.240 But going as far as to make everything its own category leads to a fractured codebase where related functionality isn’t grouped together. It’s important to think critically about what concepts you turn into a package. You don’t want to end up feeling like your monolith is a bunch of microservices in a modularized monolith.

00:27:54.600 Code duplication often becomes a big problem. No one wants to violate or create new dependencies, so it’s common to see code copied from one package to another. When this happens, it becomes more difficult to maintain duplicated code; if one version changes and the other doesn’t, now you have bugs or maybe you’re going to make upgrading Rails more difficult.

00:28:37.920 The codebase becomes bloated over time, and if you're concerned with application design and structure, duplicating code is a massive design smell. Copying code just to avoid a dependency violation should be treated as an opportunity to rethink what code is being used and why.

00:28:53.840 If it truly is shared, it should be moved to a package that's meant to have shared code or maybe moved to a gem, but definitely don’t copy it to another package; that is almost never the correct answer. Another challenge that happens when trying to modularize and isolate your Rails application is figuring out how to deal with circular dependencies.

00:29:21.840 Untangling circular dependencies in a large-scale Rails application is incredibly daunting and it’s also difficult to prevent new ones from being added. If the pets package depends on plants and plants depend on pets, it becomes much more difficult to isolate one of those packages from another.

00:29:56.000 To avoid the circular dependency, many developers would use primitives or REST APIs or GraphQL calls, introducing performance regressions along the way. To correctly isolate these packages, often major refactoring needs to be done to pull that shared code and untangle the mess.

00:30:27.440 Having explored these problems caused by modularization and isolation, let's revisit the original problem set to see how our architecture falls short in addressing them. The problems we talked about earlier are ones that many companies cite when reaching for modularization, and yet if we look at the state of the applications using this architecture today, we’ll see that none of these problems that we set out to solve can be solved by changing our architecture.

00:30:58.720 This is because they’re human and cultural problems, not technical ones. At the root, no amount of moving code around into different directories or migrating to a different architecture can fix these architectural problems. They are human problems because our tools can’t tell us how to refactor our code to have better organization and structure; they can only tell us that there’s potentially a problem.

00:31:31.680 So can modularization and isolation automatically improve the structure and organization of an application? No. When an application is modularized, code no longer lives in the top-level directory, but that doesn’t mean the packages themselves are well organized and properly structured.

00:31:54.960 We still need to put a lot of effort into figuring out where the code should live and then prevent new packages from becoming disorganized. Humans are pretty good at categorizing things, but we're not always great at choosing the right category. Simply looking at the name of something doesn’t tell us who uses it or where it should live.

00:32:33.440 While modularization and isolation are a way to improve organization and structure, it still requires an understanding of how to design software, which is something we don’t often teach. Can a modular monolith make our code automatically less tightly coupled? No.

00:33:07.680 While modularization and isolation can help you identify dependencies, it doesn’t actually reduce coupling or introduce boundaries unless you refactor the existing code. Adding dependency violations to the to-do list and setting up allowed dependencies doesn’t fix the design of your code; it just tells you where there’s possibly an undesirable design.

00:33:39.480 You still have to understand how to rewrite and redesign the existing code while not deviating too far from the Rails way. This is a human problem because it requires educating teams on how to design software for Rails while avoiding implementing worse anti-patterns like primitive obsession.

00:34:02.560 Operational problems are human problems because they need to be fixed at the source. They are caused by ignoring technical debt rather than the architecture being used. So while modularizing a monolith might speed up CI and make tests less flaky, no, there is nothing about modularization and isolation that inherently improves CI.

00:34:17.520 Testing flaky tests aren’t necessarily caused by having a monolith; they are probably caused by network calls creating too many records, race conditions, resource contention, or leaked state between tests. Because flaky tests have many causes that go beyond architecture, modularizing and isolating Rails applications won’t automatically fix your flaky test suite.

00:34:47.680 The same goes for speeding up CI. If the reason it's slow is because you have slow queries or a really large test suite, it’s not going to be made faster by moving your code into smaller directories. Will modularization allow us to deploy in isolation? Not yet.

00:35:18.240 The existing modularized monoliths in the Rails ecosystem are all still deployed as a single unit as if they were a traditional Rails monolith. As far as I'm aware, no one has isolated a part of their monolith to be deployed separately while remaining in the same codebase.

00:35:51.200 This sounds like an architecture problem, but it’s a human problem because untangling dependencies in order to scale deployments and isolation requires redesigning major parts of an application. Organizational problems are human problems because they’re deeply rooted in our organization’s leadership and are directly related to what we define as company culture.

00:36:16.399 When you modularize your monolith, can finding and assigning code owners be easier? Somewhat, but not really. It is theoretically easier to assign a team to an entire folder instead of individual files. However, just because you know who owns something doesn’t mean you can actually get them to do the work.

00:36:40.719 If exceptions and deprecations aren’t prioritized or incentivized, the owning team will almost always choose to ship features over maintenance work. Additionally, reorganizations, team renames, or pivoting parts of your product can result in sections of the codebase being essentially unowned.

00:37:07.480 As your organization grows, it’s important to be able to define and have ownership. However, changing your architecture doesn’t actually fix that; all it does is contain the code that needs an owner. Ownership is a culture problem, not an architecture one.

00:37:40.760 Can onboarding new hires be easier? This is really hard to measure, but I’d argue no. A modularized monolith is still a Rails monolith that runs all the CI tests in one build and is still deployed as a single unit. A change in one package can still affect another because many packages are tightly coupled.

00:38:00.760 So I don’t think it’s accurate or fair to say that modularization allows new hires to only consider the domains they own. In addition, a modularized monolith deviates from Rails conventions, and therefore onboarding someone to your application requires teaching them how to use these tools and how they change how code is written for Rails.

00:38:22.200 The tools we use to maintain structure and style create friction in development and may not actually put new hires in a position to ship faster. Setting boundaries in an application doesn’t teach anyone how to write idiomatic Rails code; it just tells them when they have a violation.

00:38:44.400 Looking back at the problems that we set out to solve that are still present, and all the new problems that we have, you may think that I’m against modularization, but where code lives is not my biggest concern. What concerns me is that isolation is actually very hard, especially in an application, especially in a language and application that is designed to be global.

00:39:13.920 It’s also hard because these monoliths are 15 to 20 years old, made of millions of lines of code and worked on by thousands of engineers of varying experience. The reason these problems aren’t solved isn’t because we didn’t try hard enough or because modularization is bad or because we don't have the right tools.

00:39:42.760 The reasons we haven’t solved architectural, organizational, and operational issues is because you cannot solve human problems with modularity. The problems we're trying to solve are cultural and indicative of dysfunctional engineering organizations. We would have these problems if we instead had chosen to stay in a monolith or migrated to microservices.

00:40:09.480 No amount of architecture can save you from an organizational structure that doesn’t prioritize code quality, fixing technical debt, and allowing engineers to pivot when a path clearly isn’t working. Organizations promote and incentivize silver bullets instead of rewarding maintenance and foundational work. In order to address these problems that we looked at today, we need to understand their causes.

00:40:44.240 How does an application get to the point where it feels like a ball of mud? If you’ve worked at a startup or a Greenfield application, you know it doesn’t start out this way. If Rails created a so-called ball of mud from day one, none of us would be here using Rails.

00:41:39.119 The truth is that an application goes too slowly over time, commit by commit, until it feels like development and productivity have come to a screeching halt. There’s no single cause of problems; we often fail to see them until it’s too late, when there’s no time or ability to fix the problems that we see the correct way. We blame our tools, we blame Rails, and we blame each other.

00:42:31.480 Pressure to ship is one of the many reasons that applications turn to a ball of mud. It’s much easier to keep an application loosely coupled with good structural organization when there are just a few developers adding features and no rush to ship. But then you get funding, and your stakeholders want to see a return on their investment, or you go viral and your customers are mad because you can’t handle the amount of traffic.

00:43:30.240 So the only way to keep going is to ship, ship, ship. Any maintenance work or technical debt incurred is ignored because fixing that isn’t even on leadership’s radar. Over time, the code becomes tightly coupled because bolting onto existing functionality is easier than pausing to do the work to properly refactor what’s already too entangled.

00:44:00.440 As the pressure to ship mounts, technical debt grows. This is both incurred in the form of ignoring maintenance tasks like Rails upgrades and gem upgrades, as well as building changes into existing functionality when the priority is shipping over quality and fixing things the right way.

00:44:28.080 Technical debt has a way of slowly growing silently without being noticed. Before you know it, you're years behind on your Rails and Ruby versions, there are monkey patches everywhere, and seemingly innocuous changes take weeks to ship.

00:44:51.680 Fixing exceptions feels near impossible now that there are thousands, hundreds of thousands a day. With all that noise, what’s one more? No one wants to take responsibility for maintenance tasks when the only way to get promoted is to ship features.

00:45:20.600 The pressure to ship and the mounting technical debt increases the pressure to hire. You need more developers to ship more, and the ones you have aren’t working fast enough because the company is prioritizing feature growth over code quality.

00:45:47.320 New hires never get onboarded properly; no one teaches them Rails or how an application should be designed. They’re thrown into the deep end with no support; they feel overwhelmed and like they aren’t productive, so they end up blaming the framework.

00:46:11.440 New hires are complaining that Ruby is slow and bad and we should rewrite in another new popular language that’s faster like Go. All they see is technical debt, no structure, tight coupling, and slow CI.

00:46:36.800 Shipping too fast and increased hiring are both symptoms of a larger problem that growing organizations have: growing problems that are caused by misaligned incentives. As organizations get larger, they need to add more layers to leadership to make sure everyone’s doing their job properly.

00:46:59.840 OKRs and KPIs become the metric for whether things are going well, and the obvious cultural problems that led us to a state where our application feels fragile and fractured continue to be ignored. In order to meet their OKRs and KPIs, managers and VPs need to know who to assign work to and who’s accountable for outages.

00:47:26.080 The thinking is, if we just knew who was in charge of this code, we can measure who isn’t doing their job. While you do need to know who owns code in a large application, it’s often taken too far. Rather than working together and collaborating as an organization, misaligned incentives breed a blame-based culture.

00:48:06.160 I should not have put too many B's in one sentence. That results in siloed teams who want to protect their code from the rest of the organization at all costs. The desire to modularize and isolate partly comes from wanting to feel like you don’t have to think about the rest of the codebase. However, it quickly turns into this code is mine to protect mentality.

00:48:31.920 I actually watched this happen at GitHub while I was there. As the application and organization grew, it felt like it became more important to figure out who to blame than it did to work together on fixing technical debt. A blame-based culture leads to teams wanting to protect their code and database queries from the rest of the organization so they can prove they weren’t the cause of a site outage.

00:49:05.760 Within the organization, keeping teams away from your code becomes far more important than collaboration because collaboration means it’s not clear who to blame. This blame-based culture results in the loss of teamwork and empathy. The problems that we looked at today aren't caused by organizational structure and aren't solved by organizational structure and isolation because they aren't problems caused by technology or lack thereof.

00:49:44.080 They are human problems caused by culture within an organization, and there is no silver bullet technology that will fix that. It takes good leaders and a concerted effort to address the underlying challenges that allow applications to turn into a ball of mud over time.

00:50:08.960 Not all hope is lost, though. Just because we can’t solve these problems with architecture changes doesn’t mean they’re not fixable or avoidable. It is not inevitable that your application turns to a ball of mud and your only recourse is to spend years moving files around into different folders.

00:50:56.720 In order to address the human problems that we’re facing, we need to improve our developer education. We cannot keep hiring developers and failing to train them on how to write Rails the Rails way. Engineering onboarding at most companies is a week. That’s not enough time for a new hire who’s not proficient in Rails to really learn Rails.

00:51:21.600 If we don’t train new hires on why we use Rails, how to write Rails, how to organize code, how to write tests, how to use the features of the framework, how to avoid sharp knives, and how to follow Rails conventions, we’re doing ourselves and them a disservice. We cannot expect dependency violations to teach anyone how to design software for Rails applications.

00:52:24.640 Companies that are hiring developers to write Rails need to do more education, and part of that education comes in the form of making sure every team has at least one person who knows Rails well. Having whole teams of developers from other languages who were never trained on Rails can’t possibly produce well-designed idiomatic Rails code at scale.

00:52:58.560 It’s not their fault; it’s ours because we failed to train them. In addition to education, we need indoctrination. This is different from education because it goes beyond the technical basics of how to write Rails. We need to evangelize new developers coming into the framework.

00:53:31.760 I remember back in the early days, we used to all give talks about how Rails is the way it is and how it promotes developer happiness, and we kind of stopped doing that. As developers have more and more choices about what languages they can write, we need to show them the joy that Rails brings.

00:54:08.440 You can be so productive if you’re following conventions and you know how to use the framework. But when your only experience with Rails is an application that’s deviated from conventions and is littered with technical debt, you might not be able to see the beauty of the framework.

00:54:47.760 Without indoctrination, without getting our coworkers to fall in love with Rails like we did, all they see is another tool, another language. Rails is more than simply a Ruby framework, and I want others to feel the same joy I do when writing Rails applications.

00:55:36.640 Leaders need to reprioritize quality. It’s easy to fall into the 'ship faster' trap, but waiting until shipping is painful to improve quality isn’t good for the application or team morale. When we ignore technical debt for a long time, it has a tendency to build up to a point where everything feels impossible.

00:56:13.840 I’ve seen so many organizations wait until it’s too late, and when that happens we start searching for a silver bullet. We invest years in a path that won’t fix the problems we have because it’s easier than addressing the real problems we have: an engineering culture that values shipping over quality, features over bug fixes, and magical solutions over fixing individual problems at their source.

00:57:30.280 Part of the quality issues come from incentives that value shipping features or silver bullets over targeted, methodical improvements. If improving performance, cleaning up technical debt, or upgrading Rails isn’t rewarded the same way as shipping features, why would anyone in your engineering organization prioritize the grueling, unshiny work?

00:58:29.040 It’s up to leaders to highlight that work and reward it with promotions and raises. We cannot expect any improvement to code quality if refactoring isn’t rewarded. Just because someone doesn’t say, 'Look at me! I made this better.' doesn’t mean they aren’t the ones keeping the lights on.

00:59:10.440 Leaders need to both highlight and reward good work, especially when it’s not visible. If you decide to modularize your Rails application in the future, there are some technical challenges to keep in mind. As we’ve discussed, it won't fix human and culture problems. But that doesn’t mean that a modular monolith won't help you with some of the problems in your application and organization.

00:59:57.920 First, if you modularize, start with the least number of packages possible. There’s no reason to decide every domain boundary upfront, because you’re going to learn a lot about your application and product during this process. It's also a lot easier to undo modularization if you have just a handful of packages and you decide it doesn't work than if you’ve got 50+ packages, especially if they’re all nested.

01:00:44.640 By making fewer packages upfront, you also get to avoid package obsession and ownership obsession. You also don’t want to end up with an application design and architecture that models your org chart; otherwise, you’ll spend more time moving files around than you will fixing technical debt.

01:01:29.680 When isolating a monolith, the focus should be on functional isolation rather than domain isolation. This means that instead of creating strict boundaries between every single concept, the focus is on where the application seams are.

01:01:57.440 So for example, if we look at our Fur and Foliage application, we may look in there and find that there's staff admin functionality built into the monolith for running data transitions or looking up customer accounts. This is a great example of something that can be functionally isolated from the rest of your monolith.

01:02:30.000 It’s not part of the core product and it doesn’t affect customers, but you still need it to share code or share information. By isolating on the functional level instead of the domain level, we can be more cognizant of avoiding indirection and primitive obsession.

01:03:07.920 When modularizing your Rails applications, be sure to not do it prematurely. It’s hard to say what the right time is, but it’s certainly not when you have just a few hundred lines of code or a few hundred thousand lines of code.

01:03:44.640 Instead of modularizing too early, spend time identifying and addressing technical debt. Otherwise, modularization can end up hiding poor design decisions for years and introduce worse patterns. Remember that modularization does not fix organizational structure, nor does it improve existing technical debt.

01:04:20.000 It’s important to fix problems at their source. This means that even if you modularize your Rails application, you still need to figure out why CI is slow and flaky. You need to refactor tightly coupled code, reorganize poorly structured code, teach engineers how to write idiomatic Rails, and avoid a blame-based culture by ensuring maintenance work on technical debt is as valued and incentivized as future work.

01:05:07.840 The real heroes in your organization are those who painstakingly work on improving bugs, performance issues, and ensuring that upgrades are done in a timely manner. And remember, don’t fall for the sunk cost fallacy. You should never continue down a path that isn’t working just because it feels too hard to turn back.

01:05:52.560 It’s important to re-evaluate architectural and technical solutions over time. What was right a few years ago might not work well today. It’s okay for your goals to change. At Shopify, when we started modularizing our monolith, our goal was to isolate packages, run them as separate CI builds, and deploy them separately.

01:06:37.680 Over time, this has proven to not only be really difficult to do, it’s not really right for our organization. We refocus our efforts on functional isolation and production, improving the developer experience and removing checks that cause more friction than benefits in development.

01:07:06.800 It’s important to have a healthy engineering culture that can critically look at what is working and what isn’t, and without feeling like pivoting is indicative of failure. So, I’ve been writing Rails applications for 14 years, and I’ve seen a lot of different applications at varying stages of their evolution.

01:07:39.840 I’ve spent a significant amount of time in codebases of some of the very first Rails applications ever built, and what we've seen over time is that as engineering organizations scale and applications grow, eventually they get to a state where the framework stops bringing joy to developers.

01:08:04.720 Joy and happiness. We find ourselves missing the productivity we had when we first started building our product. As development slows and joy is replaced with friction and frustration, we look for someone or something to blame. When I hear developers talk about how Rails applications turn to a ball of mud, they make it sound like it’s inevitable.

01:08:28.560 It’s very common for an application that’s millions of lines of code and worked on by thousands of engineers to feel like a big mess. They say something like Rails doesn’t provide patterns or tooling for managing the increasing complexity of a monolith, but I don’t think the challenges that we looked at today are the responsibility of the Rails framework to solve.

01:09:16.560 The truth is the ball of mud is actually caused by an engineering culture that incentivizes shipping over code quality, hiring fast over education, and silver bullets over rewarding those working on technical debt. Why should Rails provide tooling for modularization when the complexity, lack of structure, and friction fill development environments aren’t caused by the framework itself?

01:09:46.800 Often, it seems that when we reach for modularization, we're trying to engineer ourselves out of a large organization. We keep trying to find ways to make a 20-year-old monolith feel like a Greenfield application, but it can’t because it's not, and it shouldn’t be.

01:10:20.280 It makes sense that as developers we want to reach for technology to solve our problems because that's where we have control. But Rails can't engineer us out of our problems, and neither can modularization or engines or Packwerk or microservices.

01:10:48.840 This isn’t because modularity and isolation are bad; it’s because we’re trying to solve human and cultural problems by changing our architecture. At Shopify, we've been trying to engineer ourselves out of these problems for six years. Gusto, GitHub, Doximity, Zendesk are also trying.

01:11:19.720 The truth is this is kind of uncharted territory. We’re still trying to figure out how to make working on a monolith of this size bring joy to developers. If we look at the current state of our applications, we’ll see that many of our problems that we set out to solve are ever present, and we have new issues that have cropped up.

01:11:55.360 We want to blame our tools, our engineers, our framework, but ultimately blame won’t help. We need to shift our engineering culture to value what’s really important in order to solve architectural, organizational, and operational issues.

01:12:30.080 Leaders in our companies and in our community need to do more. Engineering culture comes from the top. We have to start educating new hires not just on how to use Rails, but on software design and rails philosophy as well.

01:13:04.520 We need to prioritize fixing technical debt and invest in code quality over rewarding only those who ship features. We need to start talking about why we love Rails and indoctrinate newcomers so they not only keep writing Rails, but they fall in love with it the way we did.

01:13:37.440 We need more collaboration in our engineering organizations rather than focusing on blame-based development. We need to be curious and discerning. Don’t fall for thinking there's a magic cure-all solution to these problems, because there isn’t.

01:14:31.160 The myth of the modular monolith is that architecture cannot fix human and culture problems, but fixing human and culture problems can improve our architectural, operational, and organizational challenges. And while AI technology is trying to eat our jobs and livelihoods, these human problems are only going to become more obvious and more important to solve.

01:15:12.640 Refocusing on education, indoctrination, and collaboration will ultimately make us more successful at solving these hard problems. Let's invest in our engineering culture, embrace our monoliths, and rediscover joy in programming Rails at a large scale. Thank you.

See Slides on speakerdeck.com

Eileen M. Uchitelle

@eileencodes