00:00:03.719
Hi everyone, I'm Maple Ong. Nice to meet all of you! I'm really happy to be here; it's my first time in Sydney.
00:00:19.039
Okay, I want you to close your eyes for a moment. Picture the year 2030. What does it look like in Sydney?
00:00:29.160
You can open your eyes now. According to Dell, this is what Sydney will look like in 2030.
00:00:38.360
When I saved this picture, I obviously had never been to Sydney and didn't realize how wrong it was. So, fast-forward to 2030. Rails still exists, and you got a new job. It's your first day, thankfully, because AI hasn't taken over all the jobs yet.
00:00:54.240
You enter your office, and for some reason, your company still rented out this office even though no one shows up to work. You sit at your desk and realize that the new company you've joined is still on Rails 7.1. I can't believe we're still on Rails 7.1!
00:01:11.960
You want to text your manager but obviously don't. You might be thinking, 'Rails 7.1? What a great release! We got Trilogy and composite primary keys!' But to put things into perspective, that's like being on Rails 5 right now. Imagine the number of security vulnerabilities your application has accumulated by then.
00:01:22.720
Not to mention the cool Rails features you're missing out on. If there's one thing I want you to take away from this talk, it's that it's never too early to show your growing Rails applications some tender love and care, regardless of the size of your team.
00:01:48.960
I know what you might be thinking: 'Tender love?' This is what you can expect from this talk: I'm going to share with you what the Rails infrastructure team does at Gusto, why you may eventually need a Rails infrastructure team, some team philosophies that guide how my team approaches our work, and what we've accomplished as a team.
00:02:10.239
This is Lessons from a Rails infrastructure team. I work with Gusto, which is software for employee benefits and human resource management for companies in the U.S. We have around two Rails applications, but today I'll focus on our main Rails monolith.
00:02:30.480
Our main Rails monolith was born in 2013. Fast-forward to 2019; it's now six years old, has 1.2 million lines of code, 538 models, and 100 to 125 engineers working on it. This follows the continuity of Dimitri's talk from yesterday, which explored how to run a large company.
00:02:51.880
In 2019, we were on Ruby 2.3 and Rails 4.2. Of course, if you're familiar with Rails, you would have noticed that Rails 4.2 is no longer supported, which is a huge red flag. This was when the Rails infrastructure team at Gusto was born.
00:03:01.760
We didn't initially call ourselves the Rails infrastructure team; we referred to ourselves as the product infrastructure backend team. Essentially, we acted as a bridge between product development and the infrastructure behind it.
00:03:16.960
It started with a couple of volunteers from various product teams who saw the need for long-term application maintenance because no one was officially doing the work. Their team lead was very supportive, recognizing the benefits.
00:03:26.480
The first unofficial project was upgrading from Rails 4 to Rails 5. When I say Rails infrastructure team, it can also be known by other names, such as Ruby or Rails infrastructure or architecture teams.
00:03:50.720
So, when I refer to the Rails infrastructure, I mean teams that handle this sort of work. Fast-forward to 2024; we're now 11 years old with 4.6 million lines of code, 1,800 models, and over 400 engineers working on it. We're on Ruby 3.3 and Rails 7.1, which is great!
00:04:04.560
Just a quick shout-out to my team at the moment: we have Josh, Calvin, Nonan, Zener, and I, with Nonan leading our efforts. I will share the work we've done.
00:04:17.360
You might be wondering, 'What exactly does a Rails infrastructure team do?' People often say, 'We do the dirty work'. It's the work that no one wants to do. We handle upgrades and maintenance so that product engineers don't have to, but I don't see it as dirty work; I genuinely enjoy it.
00:04:35.320
We ensure the application is healthy and that developers are productive. I'll mention a few things that we manage, which can be broken down into four main categories.
00:04:45.680
The first category is application maintenance, including application security and ensuring Ruby, Rails, and gem dependencies are maintained and updated regularly.
00:04:59.920
The second category is shared resource management, where our team is responsible for the background job infrastructure at Gusto.
00:05:09.040
We manage the service level agreement queues and take care of resource allocation strategies, scaling, and dynamic rerouting of jobs to ensure that all queues are in good shape.
00:05:28.919
The third category is developer productivity: we make sure that developers can remain productive and ship their code in a timely manner through continuous integration.
00:05:38.559
We ensure that continuous shipping is possible and seamless development workflows are maintained. The last category is application scalability; we want to make sure that the application continues to grow and scale properly.
00:06:09.360
This includes applying Ruby and Rails best practices across the company, as well as ensuring technology and library migrations are carried out appropriately.
00:06:35.840
Here are some signs that you may need a Rails infrastructure team. I put the term 'Rails infrastructure team' in quotes because it doesn’t necessarily refer to just one team.
00:06:52.720
It might be a single engineer dedicated to improving the health of your application or a few volunteers. Some indicators include an increase in build and infrastructure costs and time, reliance on deprecated or unmaintained dependencies, and an increase in developer iteration cycles.
00:07:15.960
You may notice redundant work across teams or damaging and inconsistent coding practices throughout the application.
00:07:44.000
Now I want to share some team philosophies that will guide you in approaching this type of work. I know my team uses these philosophies to determine what work to prioritize within the company.
00:08:12.039
The first philosophy is basic yet essential: we strive to stay closely aligned with the Ruby and Rails ecosystem. This is crucial as it ensures we can seamlessly integrate future language and framework updates, maintain compatibility, and minimize potential disruptions.
00:08:32.640
For example, we use the latest Rails configuration defaults whenever we upgrade to the newest Rails version. These defaults are what the Rails core team decides to drive the framework.
00:08:56.480
Next, we prioritize upgrading our gem dependencies regularly. Although this may seem obvious, it's crucial for maintaining the overall health of the application.
00:09:07.440
We also ensure we use supported libraries and tools, such as switching from Pry byebug to debug. Debug is a now supported library from Ruby Core and contains richer debugging features.
00:09:21.760
Additionally, we make sure to remove any backports after upgrading to a new version. Finally, we believe in engaging with the Ruby and Rails community.
00:09:37.440
This involvement can take many forms, such as supporting open-source libraries or finding opportunities to contribute bug fixes or issues upstream.
00:09:54.280
Contributing upstream is one of my favorite aspects of working on the Rails infrastructure team because it allows for personal growth as an engineer and enables collaboration with people from other companies and open-source maintainers.
00:10:17.880
I’d like to share two examples. The first is when my teammate, Josh, ran a memory profile on the entire application and discovered that some files did not have the frozen string literal true magic comment.
00:10:37.200
This inconsistency in how files were excluded from the Rubocop cop led to additional memory allocations on a hot path. He fixed this in our application and also identified an opportunity to contribute a pull request to RSpec support.
00:10:44.720
Another example involves Nonan, who introduced a new feature to Bundler, streamlining the method of inputting the Ruby version. Previously, we had to join the Ruby version file and read it manually; now we can simply specify the Ruby version file.
00:11:05.240
Working on the Rails infrastructure team also allows for collaboration with product engineers across the company. Our core goal is to empower engineers to make good decisions that ideally benefit the health of our code and overall application.
00:11:22.960
At Gusto, we generally try to avoid strict enforcement of rules; however, we do make exceptions when certain practices pose risks to code maintainability, scalability, or performance.
00:11:37.840
For instance, we make it easy to do the right thing while making it hard to do the wrong thing. One example is our restriction on Ruby's refinements.
00:11:54.480
Refinements are a Ruby feature that serves as an alternative to monkey patching. While refinements allow only local modifications to a class, the global effects of monkey patching can complicate debugging, especially with several developers working on millions of lines of code.
00:12:14.160
Refinements also tend to increase boot time and may become a barrier to future upgrades. Therefore, we restrict the use of refinements in our codebase with Robocop.
00:12:33.600
Additionally, in adopting new tooling across the application, we work to increase the value proposition for a particular tool. For example, when introducing Sorbet, we did not make it mandatory; instead, we made it easy for teams to adopt it by highlighting its benefits.
00:12:50.960
Over time, we noticed that many developers embraced typing their code and, as a result, our application is now largely typed.
00:13:08.960
Jumping back to gem upgrades, it is essential for maintaining application health. We prefer to upgrade incrementally rather than all at once. This means we bump the software version one step at a time over the course of weeks.
00:13:25.960
Gradually moving from the current version to the next allows us to mitigate risks more effectively. Compared to a full upgrade where we may jump to the target version immediately, incremental upgrades make debugging easier.
00:13:46.480
Incremental upgrades also let us take advantage of the benefits from each version bump. Overall, we break up the work into manageable chunks throughout the year.
00:14:06.160
This approach allows one person to handle the upgrades rather than dealing with the full project which could require multiple engineers at once.
00:14:23.520
The key is to maintain a pipeline. Currently, in our Gemfile, we are on Rails 7.1, but we point to the branch '71-stable' instead of a specific version.
00:14:38.480
This branch contains the commits that will be included in the next patch version, ensuring access to potential bug fixes and security patches. We have a bot that runs 'undler update rails' weekly, generating a pull request.
00:14:56.920
All I need to do is check the change log. If the CI is green and everything looks good, we can merge it.
00:15:10.200
Eventually, we hope to utilize Rails Edge for upgrades, which is the main branch; for now, we're sticking with the 71-stable branch. The primary reason we're able to ship changes when the CI is green is due to our trust in the robustness of the CI build.
00:15:27.840
This is a critical value for us as we work on the Rails infrastructure team. It’s comparable to being a pilot: when you want to land, you can’t simply look outside the window; you have to rely on the instrumentation in front of you for navigation.
00:15:44.120
In the same way, we treat our CI build as essential instrumentation. A robust CI build allows us to catch issues through unit tests, integration tests, and smoke tests before deploying in a production environment.
00:16:03.640
Having confidence in the CI build allows for quicker iterations, which benefits both product engineers and our team, as we want to ship upgrades without relying too extensively on manual testing.
00:16:22.440
For instance, we have a question in our incident postmortem documentation encouraging developers to identify ways to prevent incidents from recurring by catching them on CI.
00:16:38.680
One specific example is an environment CI check we have in our build that boots up our application in various contexts. For example, in console development, we ensure nothing breaks local development.
00:16:52.640
As code and applications grow, so will the number of tests, resulting in increased build time, costs, and complexity—all of which can lower developer acceleration. As a team, we invest in libraries that make long-term maintenance easier.
00:17:10.440
We use Buildkite for our CI platform, which I highly recommend. In the past, our pipeline was less sophisticated, comprised of just two main steps for setup and testing.
00:17:29.160
Thanks to Buildkite's dependency feature, it now dynamically determines which steps to run, leading to a more organized and efficient pipeline.
00:17:41.680
Instead of a single, bulky pipeline file, we created a smaller pipeline.rb file, yielding improved readability and organization. This newfound structure allows us to conditionally run steps based on which components are modified.
00:18:05.680
For instance, if a modified file doesn't affect Ruby files, we skip the associated tests, which increases efficiency. This also reduces the number of run steps, thus speeding up build times and lowering infrastructure costs.
00:18:25.480
While we continued optimizing CI build times, we recognized that increasing CI times also corresponded with our application’s growth. When a product developer submits a simple change to one file, we don't need to run the entire CI build.
00:18:50.640
Given a list of changed files, our goal is to identify the minimum number of tests required to assure that the change is safe to merge.
00:19:12.920
We developed a feature called test mapping using Ruby's TracePoint, which helps us determine the specific tests to run based on the modified files.
00:19:29.640
Each build now runs only the tests related to affected files, allowing us to save significant time and money. For instance, in just five days, this mapping saved us $5,000 and 68 hours of engineering time, equating to over a quarter of a million dollars a year.
00:19:49.120
This improvement reflects our efforts to optimize our CI build, resulting in decreased build times from 30 minutes in 2020 to just 10 minutes in 2024.
00:20:09.840
While that might not seem massive at first glance, consider the increased scale in test suites and the number of developers contributing to the application.
00:20:35.520
Maintaining these efficiencies will require ongoing effort. This sort of maintenance work can often feel thankless: when done well, no one notices because everything runs smoothly.
00:20:51.520
However, I want to emphasize some significant impacts our team has made. Firstly, we are now using the latest Ruby and Rails versions, which is crucial for avoiding security issues.
00:21:13.440
With an efficient pipeline in place, we've upgraded to Ruby 3.3 within one month of its release and Rails 7.1 within five months, all accomplished with just two and a half engineers working on it.
00:21:36.680
We've also transitioned to using Ruby's JIT compiler and adopted the Trilogy database adapter, optimizing performance and reducing the need for MySQL dependencies.
00:21:56.080
Since then, we've deprecated various unmaintained libraries, which is a necessary step given the age of our Rails application and vital for continued scalability.
00:22:16.960
Moreover, we’ve improved test times and the management of background jobs and schema migrations, reducing average CI build times from 30 minutes to just 10.
00:22:34.640
Lastly, we drastically reduced incidents related to background jobs, dropping from two to three per week to zero at our peak.
00:22:53.680
You might wonder if our relatively small organization truly needs all of this. I believe that it is essential to start planning for these processes now, as they will prepare you for future growth.
00:23:15.280
As I said earlier, it is never too early to show your growing Rails applications some tender love and care. I hope I’ve given you some ideas on how to think about or work on this.
00:23:33.680
Thank you so much to my Ruby friends for your emotional support during this talk, and thank you for being here today. I would love to answer your questions or discuss anything further, including sharing your own setup.
00:23:54.320
Thank you!