00:00:01.159
Hello, everyone! I hope you all had a good lunch.
00:00:06.569
I'm excited to start my presentation today.
00:00:15.120
So, let's dive right into the topic.
00:00:24.260
Today, we're going to talk about the journey of a complex gem upgrade. First, I want to thank you for being here with me in this amazing city of Sendai.
00:00:38.280
This is actually my second time in Japan. My first visit was for the Ruby World Conference in Matsui. I have to say, I love Japan—it's an amazing country.
00:00:51.680
The last time I was in Matsui, I didn't get a chance to take a train; I opted for a bus from Osaka.
00:01:03.660
However, this year for RubyKaigi, I decided to take the train from Tokyo.
00:01:10.590
One thing that amazed me about the Shinkansen was the rotating seats. This might sound odd, but as a foreigner, it was a fascinating experience.
00:01:22.740
I was really excited, almost like how I feel about the Hyperloop concept. But then again, I just wanted to experience those rotating seats.
00:01:36.329
In my opinion, the best use case for rotating seats would actually be on planes, especially for those long flights.
00:01:49.500
Imagine being able to rotate your seat and look straight at the kid kicking your chair. Without saying a word, just making eye contact could be enough to stop them.
00:02:12.209
But I digress! Let's begin: my name is Edouard Chin, and I am a software developer at Shopify.
00:02:20.410
I work on our internal review areas team, which manages several projects, one of which ensures our application runs on the latest version of Rails and other gems.
00:02:30.880
Now, just to clarify, I know this is RubyKaigi and not RailsConf. Therefore, I won't delve too deeply into specifics about Rails in my presentation.
00:02:54.100
There will be a few slides related to Rails, but the concepts I discuss can apply to any Ruby framework.
00:03:01.180
Even though my focus won't be solely on Rails, I will use it as a supporting example due to the size and complexity of its dependencies.
00:03:06.190
Upgrading Rails from one version to another is not an easy task, especially in large applications. This makes it a perfect case study for exploring upgrade processes.
00:03:19.630
When you upgrade a gem, you always face the risk of breaking your application. Today, I will share techniques, ideas, and tools we developed to facilitate the upgrade process.
00:03:34.210
I want to emphasize that the strategies I'll discuss are applicable not just to Rails, but to any gem upgrades.
00:03:39.700
Upgrading gems is a routine task, and usually it needs to be done quite frequently.
00:03:51.040
Fortunately, most of the time, upgrading a gem is not a major issue. All you need to do is check the changelog when available.
00:04:01.930
As a side note, if you're a gem maintainer and don't have a changelog for your gem, it's beneficial to create one.
00:04:13.600
RubyGems introduced a feature back in 2017 that lets you specify the changelog URL directly in your gem specification, allowing users to find it easily on your gem homepage.
00:04:32.500
So, back to upgrading gems: if you spot no breaking changes in the changelog, that’s great!
00:04:50.020
Then, you can run the 'bundle update' command and lastly, execute your tests in CI. Wait for a successful build, and you're set to deploy. Many of these steps can be automated.
00:05:10.930
For example, Dependabot is a tool that opens pull requests automatically for you, which you then review and merge manually.
00:05:16.960
However, the real challenges arise when your application heavily relies on a dependency—like Rails—and that dependency has significant breaking changes in its new version.
00:05:31.270
In such cases, your application could break entirely, and hundreds or even thousands of your tests could start failing.
00:05:38.050
When that happens, fixing everything in a single pull request is not feasible. You must resolve issues incrementally.
00:05:44.620
Gems can typically be categorized into three buckets when upgrading.
00:05:51.670
The first bucket contains gems that are relatively easy to upgrade, but aren't used extensively in tests or development.
00:06:03.460
If something breaks during the upgrade of one of these gems, it's frustrating for developers, but it's usually straightforward to revert the commit.
00:06:16.840
The second bucket includes gems that are more challenging to upgrade as they affect the entire application, such as Rails.
00:06:29.860
Upgrading one of these gems can break something in production, impacting customers.
00:06:43.270
While you can also revert commits, sometimes side effects linger.
00:06:50.499
The final bucket consists of gems that are easy to upgrade but are crucial for your infrastructure or business logic.
00:07:02.229
A specific example in our core application is the 'resque-scheduler' gem, which we rely on for scheduling background jobs.
00:07:18.500
When we had to upgrade it, we couldn't afford for it to break, as it handles critical tasks like sending webhooks and processing payments.
00:07:30.289
Thus, we had to ensure that the upgrade process would not cause any outages on our platform.
00:07:44.249
Let me give you some context about our application. Shopify began running in production back in 2006 and currently supports over 600,000 merchants.
00:08:02.419
The application has always been built with Ruby and has never been fully rewritten; we've continually made improvements.
00:08:16.999
We maintain around 250 direct dependencies, with more than 400 if you include transitive dependencies.
00:08:32.860
Managing the size of our codebase can be challenging, and it makes upgrading dependencies more complex, especially when updates need to happen quickly.
00:08:50.720
Each year, minor updates to Rails get released. When they arrive, we know we have to upgrade our application.
00:09:12.370
Until recently, a full upgrade cycle took us between six months to a year for the longest upgrades.
00:09:21.110
This was unsustainable, especially as more developers were added to the team.
00:09:35.570
Despite having various tools to assist with the upgrade process, we found our toolkit lacked certain capabilities.
00:09:55.080
Our main pain point was ensuring that existing failures didn't interfere with the upgrade work.
00:10:06.450
When upgrading significant dependencies, our first step is preparing our application for dual-booting.
00:10:18.550
If you manage your dependencies with Bundler, your Gemfile will look quite standard.
00:10:31.290
We implement a bundler monkey patch in the Gemfile to switch between snapshots.
00:10:42.579
This allows us to run our application with two versions of Rails or any gem.
00:10:55.390
The second step involves fixing issues that arise during the dual booting process, particularly broken code due to eager loading.
00:11:18.900
Our objective is to be able to run the test suite, even if we're not expecting all tests to pass initially.
00:11:39.920
A significant amount of the upgrade time typically goes into fixing the issues.
00:11:52.310
To reduce the time spent on fixing issues, we realized we needed to introduce a new CI check.
00:12:05.029
The CI check would run the application against both versions of the gem.
00:12:18.520
Although this approach was sensible, we couldn’t implement it due to the volume of existing CI failures.
00:12:31.800
We needed to address existing failures before we could effectively enable CI.
00:12:44.970
To tackle this, we created a new set of tools that improved our upgrading process.
00:13:02.290
The first part of this solution was preparing our application for dual booting, and we also focused on resolving any issues during the booting process.
00:13:30.520
A significant enhancement was enabling CI so that CI now fails unless new broken code is introduced, while allowing existing broken tests to be marked and disregarded.
00:14:11.130
Let me provide you with a concrete example of how we marked tests.
00:14:32.390
We use the markers feature in Minitest to mark tests that are known to fail. This is similar to how rake tasks are defined, where you can add a description.
00:14:55.560
The markers applied to one test do not affect others.
00:15:00.720
We capitalize on the Minitest reporters, which provide a visual representation of the progress during test runs.
00:15:24.220
The reporters are aware of tests because when Minitest runs each test, it records the results within a result object.
00:15:43.890
The result object contains assertions, the time taken, and any failures which will be sent to the reporters once the run finishes.
00:16:00.800
This provides a clear exit code indicating whether or not the CI tests succeeded.
00:16:15.960
For our custom reporter, we define an after-test hook to check if a test is expected to fail and verify we are running on the next version of the gem.
00:16:39.170
If the test knows it's going to fail, we clear the failure messages so other reporters won’t report failures.
00:17:01.440
The main advantage of marking failures rather than ignoring them is it helps us maintain an accurate list of tests requiring attention.
00:17:18.000
If a test is marked as failing but runs without any failures, it won’t remain marked.
00:17:29.600
The CI will indicate a failure until developers appropriately unmark the test.
00:17:42.490
This mechanism motivates developers to fix these issues to keep the CI system green, helping the team make progress during the upgrade process.
00:18:11.490
Having clear ownership of components in the codebase aids in organizing and assigning upgrade work.
00:18:28.810
We used a project componentization to streamline our development process and enhance code organization.
00:18:40.830
By identifying each component, we could assign specific issues to the relevant team members, ensuring accountability.
00:19:03.300
The gamification aspect arose naturally, where teams aimed to keep their components zero failure.
00:19:31.520
Once our test suite was entirely green, we were ready to deploy the gem upgrade to production.
00:19:58.970
To minimize risks during deployment, we used a tool to limit the upgrade to a small subset of our data centers.
00:20:31.100
This allowed us to monitor how the new gem version performed, incrementally increasing the rollout percentage.
00:20:52.090
Throughout this process, we also profiled our application for performance, comparing its performance metrics pre and post-upgrade.
00:21:10.180
Once we confirmed everything was functioning well, we would complete the deployment.
00:21:27.220
Finally, we’d remove any legacy code that was needed during this transition.
00:21:41.220
A critical aspect leading to a smooth upgrade process lies in addressing deprecations prior to upgrading.
00:22:03.220
The more proactive you are with deprecations, the easier it is when it comes time to upgrade.
00:22:15.360
To mitigate the issues surrounding deprecations, we developed a gem called the Deprecation Toolkit.
00:22:29.920
This gem records existing deprecation warnings in YAML files, allowing us to track and prevent new deprecations.
00:22:48.040
If developers introduce new deprecated code, the CI system will fail to alert them.
00:23:12.620
To summarize, it is essential to address deprecation proactively to avoid complications during upgrades.
00:23:30.350
I wish you a smooth upgrade process for your applications. Thank you for attending my presentation.
00:23:41.320
If you have any questions or want to discuss further, feel free to reach out.
00:23:54.000
I would be happy to chat about the importance of stopping the bleeding.
00:24:10.350
If the gem upgrade causes issues, how do we determine if it's the gem that's malfunctioning?
00:24:24.250
We can’t completely avoid bugs, but we can implement strategies to reduce the impact.
00:24:35.940
For instance, during a Rails upgrade, we deploy on release candidates, monitoring performance while serving a limited number of users.
00:24:53.490
Taking advantage of controlled rollouts ensures minimal disruption.”
00:25:11.660
Thank you for your time and attention!
00:25:20.140
I appreciate your questions and insights.
00:25:24.890
Have a great day!