RubyConf AU 2024
Lessons From A Rails Infrastructure Team
Summarized using AI

Lessons From A Rails Infrastructure Team

by Maple Ong

In the presentation 'Lessons From A Rails Infrastructure Team', Maple Ong discusses the importance of establishing a dedicated Rails infrastructure team as Ruby on Rails applications scale. The talk is aimed at companies utilizing Rails and highlights how, as applications grow, they require consistent maintenance and upgrades to avoid vulnerabilities and ensure performance. Key points include:

  • Rails Application Growth: Ong shares the evolution of Gusto's main Rails application from 2013 to 2024, detailing its growth to 4.6 million lines of code and 400+ engineers.
  • Formation of the Infrastructure Team: Initially formed in 2019 to address the gaps in application maintenance, the team transitioned from volunteers to a structured group to handle upgrades, security, and developer productivity.
  • Core Responsibilities: The infrastructure team focuses on four categories: application maintenance (ensuring updates and security), shared resource management (such as background job infrastructure), developer productivity (improving development workflows), and application scalability (applying best practices).
  • Signs for Needing a Team: Organizations might need an infrastructure team if they notice rising build costs, reliance on outdated dependencies, or declining developer efficiency.
  • Philosophies for Team Work: Ong emphasizes aligning with the Ruby and Rails ecosystem, engaging with the community, and prioritizing regular gem dependency upgrades.
  • Practical Examples of Contributions: The team has made significant contributions such as improving memory profiles, enhancing Bundler features, and enabling a more streamlined CI/CD pipeline.
  • Efficiency Improvements: Implementing techniques like test mapping, the team reduced CI build times significantly—decreasing from 30 minutes to 10 minutes—demonstrating a commitment to continuous improvement and maintenance.
  • Conclusion and Future Readiness: Ong encourages early preparation for infrastructure needs to ensure smooth scaling and performance for growing applications, advocating that proactive care for Rails applications is essential for long-term success.

Overall, the presentation combines insights based on real-world experiences at Gusto with practical advice for managing growing Rails applications effectively.

00:00:03.719 Hi everyone, I'm Maple Ong. Nice to meet all of you! I'm really happy to be here; it's my first time in Sydney.
00:00:19.039 Okay, I want you to close your eyes for a moment. Picture the year 2030. What does it look like in Sydney?
00:00:29.160 You can open your eyes now. According to Dell, this is what Sydney will look like in 2030.
00:00:38.360 When I saved this picture, I obviously had never been to Sydney and didn't realize how wrong it was. So, fast-forward to 2030. Rails still exists, and you got a new job. It's your first day, thankfully, because AI hasn't taken over all the jobs yet.
00:00:54.240 You enter your office, and for some reason, your company still rented out this office even though no one shows up to work. You sit at your desk and realize that the new company you've joined is still on Rails 7.1. I can't believe we're still on Rails 7.1!
00:01:11.960 You want to text your manager but obviously don't. You might be thinking, 'Rails 7.1? What a great release! We got Trilogy and composite primary keys!' But to put things into perspective, that's like being on Rails 5 right now. Imagine the number of security vulnerabilities your application has accumulated by then.
00:01:22.720 Not to mention the cool Rails features you're missing out on. If there's one thing I want you to take away from this talk, it's that it's never too early to show your growing Rails applications some tender love and care, regardless of the size of your team.
00:01:48.960 I know what you might be thinking: 'Tender love?' This is what you can expect from this talk: I'm going to share with you what the Rails infrastructure team does at Gusto, why you may eventually need a Rails infrastructure team, some team philosophies that guide how my team approaches our work, and what we've accomplished as a team.
00:02:10.239 This is Lessons from a Rails infrastructure team. I work with Gusto, which is software for employee benefits and human resource management for companies in the U.S. We have around two Rails applications, but today I'll focus on our main Rails monolith.
00:02:30.480 Our main Rails monolith was born in 2013. Fast-forward to 2019; it's now six years old, has 1.2 million lines of code, 538 models, and 100 to 125 engineers working on it. This follows the continuity of Dimitri's talk from yesterday, which explored how to run a large company.
00:02:51.880 In 2019, we were on Ruby 2.3 and Rails 4.2. Of course, if you're familiar with Rails, you would have noticed that Rails 4.2 is no longer supported, which is a huge red flag. This was when the Rails infrastructure team at Gusto was born.
00:03:01.760 We didn't initially call ourselves the Rails infrastructure team; we referred to ourselves as the product infrastructure backend team. Essentially, we acted as a bridge between product development and the infrastructure behind it.
00:03:16.960 It started with a couple of volunteers from various product teams who saw the need for long-term application maintenance because no one was officially doing the work. Their team lead was very supportive, recognizing the benefits.
00:03:26.480 The first unofficial project was upgrading from Rails 4 to Rails 5. When I say Rails infrastructure team, it can also be known by other names, such as Ruby or Rails infrastructure or architecture teams.
00:03:50.720 So, when I refer to the Rails infrastructure, I mean teams that handle this sort of work. Fast-forward to 2024; we're now 11 years old with 4.6 million lines of code, 1,800 models, and over 400 engineers working on it. We're on Ruby 3.3 and Rails 7.1, which is great!
00:04:04.560 Just a quick shout-out to my team at the moment: we have Josh, Calvin, Nonan, Zener, and I, with Nonan leading our efforts. I will share the work we've done.
00:04:17.360 You might be wondering, 'What exactly does a Rails infrastructure team do?' People often say, 'We do the dirty work'. It's the work that no one wants to do. We handle upgrades and maintenance so that product engineers don't have to, but I don't see it as dirty work; I genuinely enjoy it.
00:04:35.320 We ensure the application is healthy and that developers are productive. I'll mention a few things that we manage, which can be broken down into four main categories.
00:04:45.680 The first category is application maintenance, including application security and ensuring Ruby, Rails, and gem dependencies are maintained and updated regularly.
00:04:59.920 The second category is shared resource management, where our team is responsible for the background job infrastructure at Gusto.
00:05:09.040 We manage the service level agreement queues and take care of resource allocation strategies, scaling, and dynamic rerouting of jobs to ensure that all queues are in good shape.
00:05:28.919 The third category is developer productivity: we make sure that developers can remain productive and ship their code in a timely manner through continuous integration.
00:05:38.559 We ensure that continuous shipping is possible and seamless development workflows are maintained. The last category is application scalability; we want to make sure that the application continues to grow and scale properly.
00:06:09.360 This includes applying Ruby and Rails best practices across the company, as well as ensuring technology and library migrations are carried out appropriately.
00:06:35.840 Here are some signs that you may need a Rails infrastructure team. I put the term 'Rails infrastructure team' in quotes because it doesn’t necessarily refer to just one team.
00:06:52.720 It might be a single engineer dedicated to improving the health of your application or a few volunteers. Some indicators include an increase in build and infrastructure costs and time, reliance on deprecated or unmaintained dependencies, and an increase in developer iteration cycles.
00:07:15.960 You may notice redundant work across teams or damaging and inconsistent coding practices throughout the application.
00:07:44.000 Now I want to share some team philosophies that will guide you in approaching this type of work. I know my team uses these philosophies to determine what work to prioritize within the company.
00:08:12.039 The first philosophy is basic yet essential: we strive to stay closely aligned with the Ruby and Rails ecosystem. This is crucial as it ensures we can seamlessly integrate future language and framework updates, maintain compatibility, and minimize potential disruptions.
00:08:32.640 For example, we use the latest Rails configuration defaults whenever we upgrade to the newest Rails version. These defaults are what the Rails core team decides to drive the framework.
00:08:56.480 Next, we prioritize upgrading our gem dependencies regularly. Although this may seem obvious, it's crucial for maintaining the overall health of the application.
00:09:07.440 We also ensure we use supported libraries and tools, such as switching from Pry byebug to debug. Debug is a now supported library from Ruby Core and contains richer debugging features.
00:09:21.760 Additionally, we make sure to remove any backports after upgrading to a new version. Finally, we believe in engaging with the Ruby and Rails community.
00:09:37.440 This involvement can take many forms, such as supporting open-source libraries or finding opportunities to contribute bug fixes or issues upstream.
00:09:54.280 Contributing upstream is one of my favorite aspects of working on the Rails infrastructure team because it allows for personal growth as an engineer and enables collaboration with people from other companies and open-source maintainers.
00:10:17.880 I’d like to share two examples. The first is when my teammate, Josh, ran a memory profile on the entire application and discovered that some files did not have the frozen string literal true magic comment.
00:10:37.200 This inconsistency in how files were excluded from the Rubocop cop led to additional memory allocations on a hot path. He fixed this in our application and also identified an opportunity to contribute a pull request to RSpec support.
00:10:44.720 Another example involves Nonan, who introduced a new feature to Bundler, streamlining the method of inputting the Ruby version. Previously, we had to join the Ruby version file and read it manually; now we can simply specify the Ruby version file.
00:11:05.240 Working on the Rails infrastructure team also allows for collaboration with product engineers across the company. Our core goal is to empower engineers to make good decisions that ideally benefit the health of our code and overall application.
00:11:22.960 At Gusto, we generally try to avoid strict enforcement of rules; however, we do make exceptions when certain practices pose risks to code maintainability, scalability, or performance.
00:11:37.840 For instance, we make it easy to do the right thing while making it hard to do the wrong thing. One example is our restriction on Ruby's refinements.
00:11:54.480 Refinements are a Ruby feature that serves as an alternative to monkey patching. While refinements allow only local modifications to a class, the global effects of monkey patching can complicate debugging, especially with several developers working on millions of lines of code.
00:12:14.160 Refinements also tend to increase boot time and may become a barrier to future upgrades. Therefore, we restrict the use of refinements in our codebase with Robocop.
00:12:33.600 Additionally, in adopting new tooling across the application, we work to increase the value proposition for a particular tool. For example, when introducing Sorbet, we did not make it mandatory; instead, we made it easy for teams to adopt it by highlighting its benefits.
00:12:50.960 Over time, we noticed that many developers embraced typing their code and, as a result, our application is now largely typed.
00:13:08.960 Jumping back to gem upgrades, it is essential for maintaining application health. We prefer to upgrade incrementally rather than all at once. This means we bump the software version one step at a time over the course of weeks.
00:13:25.960 Gradually moving from the current version to the next allows us to mitigate risks more effectively. Compared to a full upgrade where we may jump to the target version immediately, incremental upgrades make debugging easier.
00:13:46.480 Incremental upgrades also let us take advantage of the benefits from each version bump. Overall, we break up the work into manageable chunks throughout the year.
00:14:06.160 This approach allows one person to handle the upgrades rather than dealing with the full project which could require multiple engineers at once.
00:14:23.520 The key is to maintain a pipeline. Currently, in our Gemfile, we are on Rails 7.1, but we point to the branch '71-stable' instead of a specific version.
00:14:38.480 This branch contains the commits that will be included in the next patch version, ensuring access to potential bug fixes and security patches. We have a bot that runs 'undler update rails' weekly, generating a pull request.
00:14:56.920 All I need to do is check the change log. If the CI is green and everything looks good, we can merge it.
00:15:10.200 Eventually, we hope to utilize Rails Edge for upgrades, which is the main branch; for now, we're sticking with the 71-stable branch. The primary reason we're able to ship changes when the CI is green is due to our trust in the robustness of the CI build.
00:15:27.840 This is a critical value for us as we work on the Rails infrastructure team. It’s comparable to being a pilot: when you want to land, you can’t simply look outside the window; you have to rely on the instrumentation in front of you for navigation.
00:15:44.120 In the same way, we treat our CI build as essential instrumentation. A robust CI build allows us to catch issues through unit tests, integration tests, and smoke tests before deploying in a production environment.
00:16:03.640 Having confidence in the CI build allows for quicker iterations, which benefits both product engineers and our team, as we want to ship upgrades without relying too extensively on manual testing.
00:16:22.440 For instance, we have a question in our incident postmortem documentation encouraging developers to identify ways to prevent incidents from recurring by catching them on CI.
00:16:38.680 One specific example is an environment CI check we have in our build that boots up our application in various contexts. For example, in console development, we ensure nothing breaks local development.
00:16:52.640 As code and applications grow, so will the number of tests, resulting in increased build time, costs, and complexity—all of which can lower developer acceleration. As a team, we invest in libraries that make long-term maintenance easier.
00:17:10.440 We use Buildkite for our CI platform, which I highly recommend. In the past, our pipeline was less sophisticated, comprised of just two main steps for setup and testing.
00:17:29.160 Thanks to Buildkite's dependency feature, it now dynamically determines which steps to run, leading to a more organized and efficient pipeline.
00:17:41.680 Instead of a single, bulky pipeline file, we created a smaller pipeline.rb file, yielding improved readability and organization. This newfound structure allows us to conditionally run steps based on which components are modified.
00:18:05.680 For instance, if a modified file doesn't affect Ruby files, we skip the associated tests, which increases efficiency. This also reduces the number of run steps, thus speeding up build times and lowering infrastructure costs.
00:18:25.480 While we continued optimizing CI build times, we recognized that increasing CI times also corresponded with our application’s growth. When a product developer submits a simple change to one file, we don't need to run the entire CI build.
00:18:50.640 Given a list of changed files, our goal is to identify the minimum number of tests required to assure that the change is safe to merge.
00:19:12.920 We developed a feature called test mapping using Ruby's TracePoint, which helps us determine the specific tests to run based on the modified files.
00:19:29.640 Each build now runs only the tests related to affected files, allowing us to save significant time and money. For instance, in just five days, this mapping saved us $5,000 and 68 hours of engineering time, equating to over a quarter of a million dollars a year.
00:19:49.120 This improvement reflects our efforts to optimize our CI build, resulting in decreased build times from 30 minutes in 2020 to just 10 minutes in 2024.
00:20:09.840 While that might not seem massive at first glance, consider the increased scale in test suites and the number of developers contributing to the application.
00:20:35.520 Maintaining these efficiencies will require ongoing effort. This sort of maintenance work can often feel thankless: when done well, no one notices because everything runs smoothly.
00:20:51.520 However, I want to emphasize some significant impacts our team has made. Firstly, we are now using the latest Ruby and Rails versions, which is crucial for avoiding security issues.
00:21:13.440 With an efficient pipeline in place, we've upgraded to Ruby 3.3 within one month of its release and Rails 7.1 within five months, all accomplished with just two and a half engineers working on it.
00:21:36.680 We've also transitioned to using Ruby's JIT compiler and adopted the Trilogy database adapter, optimizing performance and reducing the need for MySQL dependencies.
00:21:56.080 Since then, we've deprecated various unmaintained libraries, which is a necessary step given the age of our Rails application and vital for continued scalability.
00:22:16.960 Moreover, we’ve improved test times and the management of background jobs and schema migrations, reducing average CI build times from 30 minutes to just 10.
00:22:34.640 Lastly, we drastically reduced incidents related to background jobs, dropping from two to three per week to zero at our peak.
00:22:53.680 You might wonder if our relatively small organization truly needs all of this. I believe that it is essential to start planning for these processes now, as they will prepare you for future growth.
00:23:15.280 As I said earlier, it is never too early to show your growing Rails applications some tender love and care. I hope I’ve given you some ideas on how to think about or work on this.
00:23:33.680 Thank you so much to my Ruby friends for your emotional support during this talk, and thank you for being here today. I would love to answer your questions or discuss anything further, including sharing your own setup.
00:23:54.320 Thank you!
Explore all talks recorded at RubyConf AU 2024
+14