RubyKaigi 2018

Journey of a complex gem upgrade

Although every gem bump should be done carefully and with attention, most of the time it’s just a matter of running the `bundle update` command, look at the CHANGELOG, and maybe fix couple tests failing due to the upgrade.
But what about upgrading a gem whom introduced a lot of breaking changes in the new version?
The upgrade could cause hundreds if not thousands of your existing tests to fail.
In this talk I’d like to share the different techniques and strategies that will allow you to upgrade any dependency smoothy and safely.

RubyKaigi 2018 https://rubykaigi.org/2018/presentations/Edouard-chin

RubyKaigi 2018

00:00:01.159 Hello, everyone! I hope you all had a good lunch.
00:00:06.569 I'm excited to start my presentation today.
00:00:15.120 So, let's dive right into the topic.
00:00:24.260 Today, we're going to talk about the journey of a complex gem upgrade. First, I want to thank you for being here with me in this amazing city of Sendai.
00:00:38.280 This is actually my second time in Japan. My first visit was for the Ruby World Conference in Matsui. I have to say, I love Japan—it's an amazing country.
00:00:51.680 The last time I was in Matsui, I didn't get a chance to take a train; I opted for a bus from Osaka.
00:01:03.660 However, this year for RubyKaigi, I decided to take the train from Tokyo.
00:01:10.590 One thing that amazed me about the Shinkansen was the rotating seats. This might sound odd, but as a foreigner, it was a fascinating experience.
00:01:22.740 I was really excited, almost like how I feel about the Hyperloop concept. But then again, I just wanted to experience those rotating seats.
00:01:36.329 In my opinion, the best use case for rotating seats would actually be on planes, especially for those long flights.
00:01:49.500 Imagine being able to rotate your seat and look straight at the kid kicking your chair. Without saying a word, just making eye contact could be enough to stop them.
00:02:12.209 But I digress! Let's begin: my name is Edouard Chin, and I am a software developer at Shopify.
00:02:20.410 I work on our internal review areas team, which manages several projects, one of which ensures our application runs on the latest version of Rails and other gems.
00:02:30.880 Now, just to clarify, I know this is RubyKaigi and not RailsConf. Therefore, I won't delve too deeply into specifics about Rails in my presentation.
00:02:54.100 There will be a few slides related to Rails, but the concepts I discuss can apply to any Ruby framework.
00:03:01.180 Even though my focus won't be solely on Rails, I will use it as a supporting example due to the size and complexity of its dependencies.
00:03:06.190 Upgrading Rails from one version to another is not an easy task, especially in large applications. This makes it a perfect case study for exploring upgrade processes.
00:03:19.630 When you upgrade a gem, you always face the risk of breaking your application. Today, I will share techniques, ideas, and tools we developed to facilitate the upgrade process.
00:03:34.210 I want to emphasize that the strategies I'll discuss are applicable not just to Rails, but to any gem upgrades.
00:03:39.700 Upgrading gems is a routine task, and usually it needs to be done quite frequently.
00:03:51.040 Fortunately, most of the time, upgrading a gem is not a major issue. All you need to do is check the changelog when available.
00:04:01.930 As a side note, if you're a gem maintainer and don't have a changelog for your gem, it's beneficial to create one.
00:04:13.600 RubyGems introduced a feature back in 2017 that lets you specify the changelog URL directly in your gem specification, allowing users to find it easily on your gem homepage.
00:04:32.500 So, back to upgrading gems: if you spot no breaking changes in the changelog, that’s great!
00:04:50.020 Then, you can run the 'bundle update' command and lastly, execute your tests in CI. Wait for a successful build, and you're set to deploy. Many of these steps can be automated.
00:05:10.930 For example, Dependabot is a tool that opens pull requests automatically for you, which you then review and merge manually.
00:05:16.960 However, the real challenges arise when your application heavily relies on a dependency—like Rails—and that dependency has significant breaking changes in its new version.
00:05:31.270 In such cases, your application could break entirely, and hundreds or even thousands of your tests could start failing.
00:05:38.050 When that happens, fixing everything in a single pull request is not feasible. You must resolve issues incrementally.
00:05:44.620 Gems can typically be categorized into three buckets when upgrading.
00:05:51.670 The first bucket contains gems that are relatively easy to upgrade, but aren't used extensively in tests or development.
00:06:03.460 If something breaks during the upgrade of one of these gems, it's frustrating for developers, but it's usually straightforward to revert the commit.
00:06:16.840 The second bucket includes gems that are more challenging to upgrade as they affect the entire application, such as Rails.
00:06:29.860 Upgrading one of these gems can break something in production, impacting customers.
00:06:43.270 While you can also revert commits, sometimes side effects linger.
00:06:50.499 The final bucket consists of gems that are easy to upgrade but are crucial for your infrastructure or business logic.
00:07:02.229 A specific example in our core application is the 'resque-scheduler' gem, which we rely on for scheduling background jobs.
00:07:18.500 When we had to upgrade it, we couldn't afford for it to break, as it handles critical tasks like sending webhooks and processing payments.
00:07:30.289 Thus, we had to ensure that the upgrade process would not cause any outages on our platform.
00:07:44.249 Let me give you some context about our application. Shopify began running in production back in 2006 and currently supports over 600,000 merchants.
00:08:02.419 The application has always been built with Ruby and has never been fully rewritten; we've continually made improvements.
00:08:16.999 We maintain around 250 direct dependencies, with more than 400 if you include transitive dependencies.
00:08:32.860 Managing the size of our codebase can be challenging, and it makes upgrading dependencies more complex, especially when updates need to happen quickly.
00:08:50.720 Each year, minor updates to Rails get released. When they arrive, we know we have to upgrade our application.
00:09:12.370 Until recently, a full upgrade cycle took us between six months to a year for the longest upgrades.
00:09:21.110 This was unsustainable, especially as more developers were added to the team.
00:09:35.570 Despite having various tools to assist with the upgrade process, we found our toolkit lacked certain capabilities.
00:09:55.080 Our main pain point was ensuring that existing failures didn't interfere with the upgrade work.
00:10:06.450 When upgrading significant dependencies, our first step is preparing our application for dual-booting.
00:10:18.550 If you manage your dependencies with Bundler, your Gemfile will look quite standard.
00:10:31.290 We implement a bundler monkey patch in the Gemfile to switch between snapshots.
00:10:42.579 This allows us to run our application with two versions of Rails or any gem.
00:10:55.390 The second step involves fixing issues that arise during the dual booting process, particularly broken code due to eager loading.
00:11:18.900 Our objective is to be able to run the test suite, even if we're not expecting all tests to pass initially.
00:11:39.920 A significant amount of the upgrade time typically goes into fixing the issues.
00:11:52.310 To reduce the time spent on fixing issues, we realized we needed to introduce a new CI check.
00:12:05.029 The CI check would run the application against both versions of the gem.
00:12:18.520 Although this approach was sensible, we couldn’t implement it due to the volume of existing CI failures.
00:12:31.800 We needed to address existing failures before we could effectively enable CI.
00:12:44.970 To tackle this, we created a new set of tools that improved our upgrading process.
00:13:02.290 The first part of this solution was preparing our application for dual booting, and we also focused on resolving any issues during the booting process.
00:13:30.520 A significant enhancement was enabling CI so that CI now fails unless new broken code is introduced, while allowing existing broken tests to be marked and disregarded.
00:14:11.130 Let me provide you with a concrete example of how we marked tests.
00:14:32.390 We use the markers feature in Minitest to mark tests that are known to fail. This is similar to how rake tasks are defined, where you can add a description.
00:14:55.560 The markers applied to one test do not affect others.
00:15:00.720 We capitalize on the Minitest reporters, which provide a visual representation of the progress during test runs.
00:15:24.220 The reporters are aware of tests because when Minitest runs each test, it records the results within a result object.
00:15:43.890 The result object contains assertions, the time taken, and any failures which will be sent to the reporters once the run finishes.
00:16:00.800 This provides a clear exit code indicating whether or not the CI tests succeeded.
00:16:15.960 For our custom reporter, we define an after-test hook to check if a test is expected to fail and verify we are running on the next version of the gem.
00:16:39.170 If the test knows it's going to fail, we clear the failure messages so other reporters won’t report failures.
00:17:01.440 The main advantage of marking failures rather than ignoring them is it helps us maintain an accurate list of tests requiring attention.
00:17:18.000 If a test is marked as failing but runs without any failures, it won’t remain marked.
00:17:29.600 The CI will indicate a failure until developers appropriately unmark the test.
00:17:42.490 This mechanism motivates developers to fix these issues to keep the CI system green, helping the team make progress during the upgrade process.
00:18:11.490 Having clear ownership of components in the codebase aids in organizing and assigning upgrade work.
00:18:28.810 We used a project componentization to streamline our development process and enhance code organization.
00:18:40.830 By identifying each component, we could assign specific issues to the relevant team members, ensuring accountability.
00:19:03.300 The gamification aspect arose naturally, where teams aimed to keep their components zero failure.
00:19:31.520 Once our test suite was entirely green, we were ready to deploy the gem upgrade to production.
00:19:58.970 To minimize risks during deployment, we used a tool to limit the upgrade to a small subset of our data centers.
00:20:31.100 This allowed us to monitor how the new gem version performed, incrementally increasing the rollout percentage.
00:20:52.090 Throughout this process, we also profiled our application for performance, comparing its performance metrics pre and post-upgrade.
00:21:10.180 Once we confirmed everything was functioning well, we would complete the deployment.
00:21:27.220 Finally, we’d remove any legacy code that was needed during this transition.
00:21:41.220 A critical aspect leading to a smooth upgrade process lies in addressing deprecations prior to upgrading.
00:22:03.220 The more proactive you are with deprecations, the easier it is when it comes time to upgrade.
00:22:15.360 To mitigate the issues surrounding deprecations, we developed a gem called the Deprecation Toolkit.
00:22:29.920 This gem records existing deprecation warnings in YAML files, allowing us to track and prevent new deprecations.
00:22:48.040 If developers introduce new deprecated code, the CI system will fail to alert them.
00:23:12.620 To summarize, it is essential to address deprecation proactively to avoid complications during upgrades.
00:23:30.350 I wish you a smooth upgrade process for your applications. Thank you for attending my presentation.
00:23:41.320 If you have any questions or want to discuss further, feel free to reach out.
00:23:54.000 I would be happy to chat about the importance of stopping the bleeding.
00:24:10.350 If the gem upgrade causes issues, how do we determine if it's the gem that's malfunctioning?
00:24:24.250 We can’t completely avoid bugs, but we can implement strategies to reduce the impact.
00:24:35.940 For instance, during a Rails upgrade, we deploy on release candidates, monitoring performance while serving a limited number of users.
00:24:53.490 Taking advantage of controlled rollouts ensures minimal disruption.”
00:25:11.660 Thank you for your time and attention!
00:25:20.140 I appreciate your questions and insights.
00:25:24.890 Have a great day!