00:00:13.759
Ready to talk about spaghetti code today. Does anybody here write spaghetti code?
00:00:19.560
All right, good. Intentionally, all right?
00:00:26.199
Cool. All right, everybody else can go because we're going to talk about spaghetti code today.
00:00:32.000
So as Spike mentioned, I'm an engineering manager at Doximity where I build software to help doctors do their jobs better every day.
00:00:39.800
I am from New Mexico, where we fly around in wicker baskets that shoot fire.
00:00:46.440
It's a pretty cool place, and I have a cute little tortoise. His name is Bunny Rabbit—big thanks to my daughter for naming animals confusing names.
00:00:53.160
So if anyone's looking for a low-maintenance pet, in about 60 years he's going to outlive me, so he might be available for you if you're interested in a tortoise.
00:01:06.000
Also, I've been working on software for the medical industry for just about 20 years, covering all sorts of areas.
00:01:11.799
From management software to scheduling software for hospitals, operating room monitoring software, and now tools for doctors' everyday processes.
00:01:17.240
It’s a cool industry to be in.
00:01:24.040
So we're going to start with a story from Doximity.
00:01:29.360
At the end of 2016, a few of us at Doximity started up a new Rails application, a microservice that was isolated from our main Rails monolith.
00:01:36.560
This isolation gave us freedom to iterate and do basically whatever we wanted.
00:01:42.840
Our goal was to establish clear boundaries of code ownership, and our spec suite ran in under two minutes—it was amazing.
00:01:49.640
This Greenfield application became increasingly important and grew into one of the most critical applications in our ecosystem.
00:01:56.719
The number of contributors to our Git repo went from about a dozen to nearly 170.
00:02:02.479
This growth became a bit confusing.
00:02:08.560
Our previously tidy Rails app became really difficult to work with, as Active Record classes and relationships were being used in unintended ways.
00:02:14.680
We constantly battled N+1 queries, and queries would run for 15 minutes, often with no idea why.
00:02:21.400
Developers had a hard time figuring out where the code belonged.
00:02:29.280
New developers would come in and struggle to figure out where things were.
00:02:35.120
So the question was, should we just fork off another microservice and repeat the process?
00:02:43.599
Could we ever get back to that bliss of a Greenfield app again?
00:02:49.840
This is a common experience with Rails applications, leading me to formulate a pattern of degeneration.
00:02:56.360
First, we start to see our controllers getting bloated, prompting us to refactor to thin controllers and thick models.
00:03:02.400
If we've felt that pain before, we tend to move on to models that become bloated, identifying God models and refactoring to thin models and thick services.
00:03:09.640
Initially, this sounds great until we end up with a giant bucket of intertwined general services that everyone uses, but no one understands.
00:03:17.000
Does anybody else have a codebase that sounds like that? Exactly.
00:03:22.040
We started seeing bugs every time we changed our microservices just a little.
00:03:26.199
We entered lockdown mode, where we were really afraid to change anything because we weren’t sure if our specs covered all the creative usages the other teams were utilizing.
00:03:35.120
The good news is that our apps, which we love so much, don't have to stay in a state of confusion.
00:03:42.840
By taking some measured, intentional steps, we can dig ourselves out of the hole of complexity and coupling.
00:03:50.039
We can leverage simple tools to enforce code boundaries, leaning into modularization.
00:03:56.400
By discussing method contracts, which we don’t often do, we can return to multiple lands of Greenfield bliss.
00:04:04.960
Packwerk is a tool that has helped us take an iterative approach to regain that Greenfield bliss.
00:04:10.920
I’ll provide some links to Packwerk and other resources later on.
00:04:16.000
Essentially, Packwerk is a static code analysis tool that detects private code access by modules outside of a package.
00:04:23.360
It prescribes a folder structure that’s somewhat similar to the standard Rails application but includes privacy significance for certain folders.
00:04:31.200
We moved all of our code from the standard MVC structure and distributed it into several packages that were highly cohesive but loosely coupled.
00:04:39.120
For example, in the public directory, we’ve created packs for sales, reviews, and social.
00:04:46.639
Everything within the public directory is accessible; if it’s not in that directory, other packages can’t access it.
00:04:53.760
Let’s take a quick example. We have two packages: a sales pack and a review pack.
00:05:01.040
The sale model is not in the public directory, so it's not public.
00:05:06.000
However, the review model in the reviews package has a method called sales rolling 30 that leverages the sale model.
00:05:15.040
This is a violation, as it’s reaching into a file located outside of the public directory.
00:05:22.640
What are some possible problems with this? This seems like a simple issue that happens frequently.
00:05:29.680
What if the sales team's application maintainers never indexed the created-at field?
00:05:36.600
It might work fine in development, possibly in QA, and maybe even in pre-production—but then you go to production.
00:05:43.960
Suddenly, the method takes 15 seconds to complete, creating issues for your production users.
00:05:50.000
What if the sales team wants to remove the finalized method or change its meaning?
00:05:56.800
That’s going to require a conversation with the review team.
00:06:03.240
What if the sales team never reviewed this PR?
00:06:09.600
Does this mean that every PR needs to be reviewed by every team?
00:06:15.400
At one point, I was reviewing 85 PRs a month from other teams accessing our methods.
00:06:22.080
We still had issues slip through because not everyone can review every single PR.
00:06:30.000
We can utilize Packwerk to check for these problems; you don’t have to review every single PR.
00:06:38.560
We have several command line tools within the Packwerk gem.
00:06:45.040
The first tool is Packwerk check, which scans the entire codebase to detect privacy violations, like the one we just covered.
00:06:53.760
If we run Packwerk check on our application, we would see a message indicating that we have a privacy violation.
00:07:01.440
It will inform us that there's a violation with the sale model in the sales package.
00:07:08.480
This is a check that can be run in your CI process.
00:07:16.600
Having perfect little packages of code might seem amazing and be a great goal, but when you're staring at a 200,000 line codebase, it's tough.
00:07:24.640
It can be hard to know where to start, gain buy-in from stakeholders, and get motivated.
00:07:31.680
This phased approach can be critical in maintaining momentum in a huge project.
00:07:38.720
Here's a sneak peek of the whole talk: we’ll examine what we actually have in our codebase.
00:07:45.680
Next, we’ll chunk files into packages, declare package dependencies, start writing public interfaces, and then maintain it.
00:07:52.240
Let’s start with the basics, which isn’t even in our codebase.
00:08:00.560
First, I’d recommend getting a lay of the land to figure out your true feature set.
00:08:07.680
Work with your product leaders to understand how they view groups of features in your codebase.
00:08:14.080
Understanding how the business thinks about roadmaps and planning and the language they use is crucial.
00:08:24.000
Design groups of features and give them names for the basis of your package structure.
00:08:30.000
This approach will likely differ from what the engineers would consider.
00:08:38.640
We used a process called event storming.
00:08:45.000
We brought together our product leaders, tech leads, and directors over a couple of days to discuss the events a user experiences.
00:08:52.560
We put sticky notes on a window, took pictures, and these became the folders, akin to the directory structures I mentioned earlier.
00:08:59.199
The next phase involves using those feature sets from the planning phase to create folders and move code into them.
00:09:04.960
When our team entered this phase, we moved classes wholesale without changing any namespaces or making any refactorings.
00:09:10.640
This helped us move quickly without getting bogged down in decisions.
00:09:16.680
It might be worthwhile to examine whether the code is being used.
00:09:23.200
If it's not being executed, just delete it and reduce the code you have to manage.
00:09:30.480
Packwerk uses YAML files to declare dependencies in these package.yaml files.
00:09:38.560
Inside, you'll set up an array of dependencies.
00:09:44.240
We held off on setting up our dependencies at this stage; we’ll get to that in a little bit.
00:09:51.360
However, we added some metadata to our files declaring the owning team, Slack channel, and the code reviewer group.
00:09:58.960
This helped in two ways: it provided a way for anyone exploring the code to know precisely where to ask questions.
00:10:05.280
With 20 or 30 packages, this information is critical.
00:10:11.040
We used this code owner piece to autogenerate our GitHub code owners file, which is very convenient.
00:10:18.160
We moved nearly everything into packages.
00:10:25.920
My perspective here was to think about what would need to be removed if I were to delete a feature.
00:10:32.520
Everything related to that feature—the models, views, controllers, rake tasks, and database migrations—moved into that package.
00:10:39.679
This leads to a highly cohesive package where everything inside is related, while maintaining less coupling between packages.
00:10:47.280
The only code we left outside packages was application configuration code that didn’t correlate to any single feature set, such as our DB schema.
00:10:53.920
We also added a README to each package directory, which isn't required by Packwerk but is a good practice.
00:10:59.440
These README files outlined the purpose of each package and explained why certain dependencies exist.
00:11:04.960
This clarity becomes essential when assumptions change or when we need to consider removing a dependency.
00:11:11.760
As we go through this phase, everything will show as a privacy violation when you run Packwerk check.
00:11:18.640
You might see a staggering number of violations because nothing is in the public directory.
00:11:25.800
This might seem daunting to look at, but Packwerk provides a solution by allowing us to generate to-do files.
00:11:31.760
This lists all the files and all the violations, indicating to Packwerk we are aware and will address them later.
00:11:41.760
We can run Packwerk update to-do, which performs the same code scanning Packwerk check does, but creates a YAML file.
00:11:46.960
It's like list-based development—the clearer we can get about our objectives, the more manageable our task.
00:11:55.440
When we ran the Packwerk update to-dos command for every PR that moved files into packages, we generated massive to-do files.
00:12:01.440
You’ll likely have numerous violations; that’s okay, as it indicates we want to increase the number of files in our package directory.
00:12:08.640
Phase two focuses on declaring dependencies between packages.
00:12:16.080
We’ll start removing dependency violations while leaving privacy violations for now.
00:12:22.560
In this phase, we draw coarse-grained lines between our packages.
00:12:31.680
For example, in our package YAML file, we declare that our sales package depends on reviews and inventories.
00:12:38.240
Fixing these dependency violations will allow us to focus on the privacy violations which remain.
00:12:45.920
As we declare package dependencies in package YAML files, we can run Packwerk validate.
00:12:53.920
This ensures we don't have any circular dependencies, which can be embedded in your CI process.
00:13:02.640
If there’s a circular dependency between the sales package and the inventory package, it will be flagged.
00:13:11.200
Eliminating those circular dependencies is essential.
00:13:17.600
You may discover tough circular dependencies that require rethinking how your code relates.
00:13:24.160
This is when you realize there is a significant amount of coupling between features.
00:13:31.280
In our case, it helped to categorize packages.
00:13:38.160
We created infrastructure packages for tasks like JSON parsing, messaging, and instrumentation—working with services outside of our application.
00:13:46.800
Next, we had common tooling packages.
00:13:55.760
If we had a base Active Record class or functionality related to the frontend that didn’t relate to specific features, we’d throw it into the common tools package.
00:14:02.960
The feature packages contained specific business logic with minimal dependencies.
00:14:10.240
Both common tooling and infrastructure did not have outgoing dependencies and thus remained independent of the features.
00:14:19.360
This approach somewhat aligns with Domain-Driven Design (DDD) architecture; think of it as DDD light.
00:14:26.960
However, you may find dependencies that cannot be simply rectified by breaking code into smaller packages.
00:14:34.960
Imagine a user profiles package and a social package where users make comments and likes.
00:14:42.560
We may encounter a circular dependency if user interactions require execution whenever users are deleted.
00:14:50.160
In such cases, consider implementing a Publish-Subscribe (pub/sub) architecture.
00:14:57.600
Social can subscribe to user profiles, signaling events without needing to know about the underlying package.
00:15:04.240
We used the Rails Event Store gem, a lightweight event bus—an ideal choice for lightweight eventing.
00:15:11.920
Phase three involves making tough decisions.
00:15:19.000
At this point, we have many privacy violations but declared all dependencies and resolved any circular ones.
00:15:26.560
Now, we need to create public interfaces for other packages.
00:15:33.680
These public interfaces are stable, serving as a contract for other teams, requiring discussions for changes.
00:15:42.560
However, teams can freely iterate on internal code without extensive code reviews.
00:15:51.520
This internal structure once again becomes that amazing development platform.
00:15:59.680
Here's an example of creating a public interface.
00:16:06.480
We emphasized extensive documentation; every public interface needed clear comments about its functionality.
00:16:12.960
This clarifies what the public interface does, doesn’t do, its requirements, and potential side effects.
00:16:20.000
We also annotated parameters to clearly articulate expected inputs.
00:16:27.440
Duck typing is great, but it can lead to unexpected issues, so we were specific about types.
00:16:34.000
We're cautious about returning Active Record objects as it can lead to leakage of private methods.
00:16:41.840
We implemented a to_data method that converts Active Record objects into plain old Ruby objects.
00:16:48.920
This abstraction allows us to return data objects without exposing the underlying Active Record structure.
00:16:56.640
This reduces potential N+1 issues since it streamlines data access.
00:17:02.720
After this, any returned data object minimizes the risk of mutating data or querying child relations.
00:17:09.680
This level of abstraction brings peace of mind when other teams utilize our public interfaces.
00:17:17.120
Regarding specs, expect many violations since we often do unconventional things.
00:17:25.120
Factory Bot is invaluable but often leads to confusion between production and testing environments.
00:17:31.360
If your models are kept internal to a package, specs outside can't assume how data is structured.
00:17:38.560
Think of specs as integration tests.
00:17:47.040
Refactored specs should avoid Factory Bot and instead use concrete methods for generating data.
00:17:55.040
Using actual creation methods creates a stable foundation for expectations.
00:18:02.200
At this point, we should ensure a high level of cohesion within our packages, while maintaining low coupling.
00:18:09.680
Packages should change independently, while everyone is aware of what belongs where.
00:18:16.800
New developers will appreciate this clarity; they can quickly find what they're searching for.
00:18:23.760
Phase four is crucial as it determines whether all the hard work pays off.
00:18:31.200
Here’s some advice on maintaining your packages.
00:18:39.120
First, ensure to run these checks in CI.
00:18:46.560
It is impractical to review every single pull request, especially after all this work.
00:18:53.360
Keep an eye on privacy violations or circular dependency issues.
00:19:01.680
Eventually, those YAML files should be deleted; they should not reappear unless you agree to take on that technical debt.
00:19:08.480
Watch out for everything ending up public.
00:19:15.920
Sometimes developers forget the reasons behind public exposure, leading to a convoluted mess.
00:19:23.520
Keep conversations active regarding the necessity of public abstractions.
00:19:30.200
Ensure other teams can access what they need from your packages.
00:19:37.680
Keep utilizing the domain model, and reconsider the way functionalities are organized.
00:19:44.400
One way to speed things up is by exposing Active Record classes, leading back to the threats of N+1 issues.
00:19:52.080
Returning plain old Ruby objects is reliable.
00:19:59.760
Documentation must stay fresh.
00:20:07.040
Utilize README files to explain why dependencies exist.
00:20:13.840
Reassess your assumptions regularly.
00:20:21.440
New developers will benefit from scanning through README files for clear insights.
00:20:30.080
Continue gathering feedback continuously, not just once.
00:20:36.480
Engagement with your support team helps clarify how external teams interact.
00:20:42.560
For example, our SRE noted the challenge of connecting stack traces to specific packages.
00:20:49.440
We discussed using namespace patterns to clarify these interactions.
00:20:57.760
Lastly, enjoy the process.
00:21:05.280
We’ve all experienced the excitement of using Rails and the freedom that comes with it.
00:21:13.280
If you manage to create clear, enjoyable package structures, it’s an excellent position to be in.
00:21:21.120
Don’t hesitate to explore new implementations that maintain these contracts.
00:21:30.000
Ensure each package maintains clarity and ease of use.
00:21:38.560
Here's that phased approach summarized again.
00:21:47.040
This is a long haul for extensive codebases with many contributors.
00:21:54.960
Understand what you have before jumping into assumptions regarding package setups.
00:22:02.000
Gather input from product teams to align language.
00:22:09.760
Create a package layout that reflects the business's perspective.
00:22:17.920
Establish package dependencies and consider why features need to be closely coupled.
00:22:26.960
Cruise your way toward clarity in your codebase.
00:22:36.320
Reassure yourself that the dream of clarity in your code is achievable.
00:22:45.040
Your application doesn’t have to remain confusing; you needn't fall back on microservices.
00:22:52.400
Remember, Rails engines are another option but may not offer the same privacy scanning.
00:23:01.120
Find a balance between leaving a giant mess and starting fresh.
00:23:09.200
Let’s keep pushing for that Monday morning excitement, building excellent software that tackles hard problems.
00:23:16.640
Thank you!