RailsConf 2023

How to Upstream Your Code to Rails

How to Upstream Your Code to Rails

by Hartley McGuire

In the video "How to Upstream Your Code to Rails" presented by Hartley McGuire at RailsConf 2023, the speaker guides developers through the process of contributing to the Rails framework for the first time. The discussion aims to alleviate the intimidation associated with making contributions and outlines practical reasons for becoming involved in open source development.

Key points of the presentation include:

- Importance of Contribution: Contributing to Rails reduces maintenance costs as the responsibility for code functionality is shared with the Rails community, allowing developers to focus on their applications.

- Knowledge Enhancement: Engaging in the upstream process serves as a valuable learning opportunity, encouraging developers to explore existing code, understand its history using tools like git blame, and learn from experienced code reviewers.

- Being a Good Citizen: By contributing, developers give back to the libraries that support Rails, ensuring a robust ecosystem for all users.

- Recognition: Contributions are recorded, allowing developers to achieve recognition on leaderboards that celebrate open source participants.

The speaker demystifies the types of contributions that can be made to Rails, including:

- Documentation Updates: Improving the clarity and correctness of Rails documentation significantly enhances user experiences.

- Bug Fixes: Developers are encouraged to find and address bugs in the Rails issue tracker, or through personal engagements with Rails in their applications.

- Performance Improvements: Enhancements to the performance of Rails components are welcome, provided they are accompanied by benchmarking evidence.

- New Features: Developers are also encouraged to contribute new, validated features, following real usage and testing in production environments.

A detailed case study is presented where McGuire shares the story of contributing a validator called 'immutable validator', which aimed to restrict changes to specific attributes. The step-by-step approach included:

1. Initial preparation of the feature and requisite testing.

2. Crafting commit messages that are informative and align with Rails' contribution guidelines.

3. Navigating through feedback from the Rails community and then iterating on the feature based on suggestions.

4. Final implementation of the validator after thorough testing and discussions, ensuring that the feature would meet Rails' standards before submission.

In conclusion, McGuire emphasizes the importance of contributing to Rails not just for personal gain, but to foster community growth and development within the open source ecosystem. Developers are encouraged to confidently engage in the contribution process, ultimately contributing to the evolution of the Rails framework and the enrichment of their own skills.

00:00:19.699 Let's get started! Quick show of hands: who has contributed to Rails before? Awesome! Who has contributed to any open-source project? That's even better! Now, who hasn’t but wants to? I'm glad you are all here. Thank you! Hopefully, by the end of this talk, we'll have even more hands raised. My name is Hartley McGuire, and I'm on the Rails Triage Team. I’m a senior developer at Shopify, where I work on our database platform called Kate SQL. If you have any questions about that, feel free to find me later.
00:00:41.820 Today, we’ll be talking about how to upstream your code to Rails. We did a poll, and a lot of you are already convinced that contributing to Rails and open source as a whole is a great idea. But just in case you want to learn or need a reason to contribute, we'll start with that. The first reason to contribute is that you can lower your maintenance costs. If the code is in your repository, you are ultimately responsible for it. You have to run tests, fix bugs, and ensure that your code works with your current and future versions of Rails. Hopefully, you're not monkey patching or using undocumented APIs, but if you are, there’s a risk that anytime a Rails upgrade happens, those could break.
00:02:03.140 If you're able to upstream that code to Rails, suddenly that responsibility becomes shared. Rails will test the code, ensure it works with new versions, and can use its own undocumented APIs without worry. All of this leads to much easier upgrades for you, allowing you to focus on the important parts of your application. One of my personal favorite reasons to contribute to Rails is to improve my knowledge of it. When upstreaming code, there are multiple parts of the process that serve as fantastic learning opportunities for developers working with both Ruby and Rails.
00:02:42.240 The first is that when you look at code you want to change, you will probably want to understand why that code was written the way it was. For this, I'm a very heavy user of "git blame." If you haven’t heard of "git blame" before, it’s a command that shows you the history of changes for every line in a file. To be honest, the name of the command is a little deceiving. The goal of using "git blame" is just to gain context about why code is written the way it is. Many people will add a snippet to their Git configuration to alias blame as “context,” so they can run "git context" without blaming anyone when they look up the code context. Personally, I actually prefer using the GitHub UI for this.
00:04:21.840 In the GitHub UI, you can toggle between the regular code view and the blame view. The blame view adds a new column on the left that shows the commit that last changed a line of code, along with a button to see what the file looked like just before that commit. For example, if we click the icon next to 'delay loading zeitvik' at the bottom, it will show you what the code used to look like. I love that you can learn so much from the commit messages and pull request discussions that lead to a code change. Speaking of pull request discussions, another huge learning opportunity when contributing to Rails is the feedback you can receive from pull request reviews.
00:05:37.500 The reviewers will be some of the most knowledgeable people about Rails in the world. You can learn a ton from them. Another important reason to contribute to Rails is to be a good open-source citizen. There are currently over 70 gems in the default lock file when you create a new Rails application. When combined, these libraries create the Rails framework experience we all know and love. By contributing to Rails or any of these libraries, you’re not just helping yourself; you’re giving back to all the people behind these libraries who have enabled you to use Rails. You're giving back to the Rails community by ensuring these libraries continue to provide users with the best possible experiences.
00:06:59.220 You’re also helping to ensure that the Rails community continues to grow and be a hundred percent a year framework. Finally, one of the most fun reasons to contribute to Rails is to collect internet points. You may have seen that the Rails team maintains a leaderboard that lists all the people who have ever contributed to Rails, including contributions from before Rails was hosted on GitHub. Since then, over six thousand people have contributed to Rails, and that number keeps going up; I think I've seen over thirty new contributors in just the past few weeks, which is amazing. The best part is that any contribution you make will add you to this list, whether it’s big or small.
00:08:43.320 Now, I can hear the questions coming in: 'Wow, that leaderboard is so cool! What can get me there?' I’m glad you asked because next up, I’ll be talking about what to upstream to Rails. One of the smallest but most impactful things you can contribute are documentation updates. Are you reading the docs and find a typo? Could a concept be better explained? Any issues you may have with the docs are most likely happening to others as well. If you take the time to improve the documentation, you can help out every single person who comes to read it after you. What makes these changes so impactful is how the quality of the documentation shapes a user's experience with Rails.
00:09:49.320 If someone is trying out Rails for the first time, their very first impression of the framework will be the getting started guide. It’s essential that this guide, as well as the other guides, are well written, typo-free, and easy to understand. If you’re interested in improving any of the guides, there’s even a guide that covers style and formatting guidelines to ensure that you follow best practices. In addition to these narrative guides, the Rails documentation includes API docs generated from documentation comments inside the Rails source code. The API docs are the definitive source of information for all documented methods in Rails. In this case, 'documented' means that the method won’t change between Rails versions without deprecation, so they're safe for you to use.
00:10:49.380 Just as the getting started guide is particularly important for new users, the API docs are crucial for all Rails users. If methods are not documented or poorly documented, it becomes harder for people to work with Rails day-to-day. Just as there are guidelines for content in the guides, there are also guidelines to help you if you’re interested in contributing to the API docs. Next on the list of things to contribute are bug fixes. There are two ways to find bugs to fix. The most commonly discussed method is to find a bug on the issue tracker, but I’ll be honest, and say that it can be discouraging. Many companies run their Rails applications on the main branch, so smaller trivial bugs might get fixed quickly, leaving more complex issues that may not have obvious solutions.
00:12:56.700 The way I’ve found to be most effective in finding bugs is through using Rails in your daily life. I recently found a few bugs in the app I work on when we upgraded to Rails 7.0, discovering a regression in migrations about types. We wrote a fix and opened a pull request. Once it was merged, we updated our gem file to point at the Rails 7-0 stable branch so we could get the fix immediately. This process allowed us to find, fix, and implement the change all within two days. Using the 7-0 stable branch has additional benefits as well; dependabot can update it, so if the branch receives new commits, we still get them. This is especially important when it comes to security releases. If your app uses a forked version of Rails with your fix or other patches, then you have to apply security fixes yourself.
00:14:01.740 By using the 7-0 stable branch, we continue to receive new commits without any added effort. Performance improvements are certainly much less common than bug fixes. If you notice something in Rails that could perform better, send in pull requests. Here’s an example of a great performance improvement pull request from a few weeks back. The most critical part of these pull requests is to include a benchmark to compare the original implementation with the new one. The benchmark code from this pull request is lengthy, but we can break it down into parts. The script begins with an inline gem file, which allows others to run the script as a single file, and Bundler ensures that all the dependencies get installed.
00:15:11.400 In this case, we only have two dependencies: Rails and Benchmark IPS, a fantastic library for measuring performance. The inline gem file may seem hard to remember, but the Rails repository includes a benchmark template that contains this, and you can base your benchmark scripts on that template. By the way, I will include a link to the slides at the end so you won’t have to remember them. The next part of the script is the new implementation of the method we want to benchmark. By naming the method something different than the original, we can compare the performance of the two in a single file. Finally, we have the part that uses Benchmark IPS to compare the performance of the original method with the new implementation. Running the script from that pull request shows that the new method is actually six times faster for a large configuration.
00:16:51.560 Finally, let's discuss the last item on my list for upstreaming: new features. One of my favorite parts of Rails is that many of its largest features began as successful pieces from Rails applications. These features are refined and battle-tested in production before being adopted by Rails itself. Once upstreamed, the original authors can share the maintenance cost with the community, and everyone benefits from the new feature. Now that we have a solid desire to contribute and a good idea of what to contribute, we can finally learn how to upstream your code to Rails.
00:17:14.580 Let’s do a feature! In the Rails application I work on, we have a validator called immutable validator. Its purpose is to prevent certain attributes from being changed. Let’s walk through how it works. 'validate_each' is the entry point for an ActiveModel validator when a model adds a validation like this, Rails will look up the immutable validator class and call validate_each with the model instance being validated, the name of the attribute as a symbol, and the value of that attribute. Given all this information, the immutable validator will check two things: first, whether the record is persisted. If the record hasn’t been saved yet, we don’t care if the value changes because it’s just being set for the first time.
00:19:14.880 The second check is whether the attribute we want to be immutable has changed. If both are true—that the record has been saved previously and the immutable attribute is being changed—then we add a validation error. Up to now, we’ve looked at a basic case where the content is something like a text column on the post model. But what if we want to validate that an association doesn’t change? Instead of validating content, let’s validate a post’s author. Our immutable validator handles this case too. Instead of passing the attribute directly to attribute changed, we send it through this attribute name method which we will write. If the attribute is one of the record’s attributes, we just return it—it’s just a column on the record. If it’s not one of the record’s attributes, we assume it’s an association and we look up its foreign key.
00:20:29.520 In the case of validating the author, instead of returning the symbol 'author' and passing that to attribute changed, this method will return 'author ID.' If you haven’t seen some of these methods before, that’s fine; they are internal methods to Rails and currently undocumented, meaning there are no guarantees regarding future changes. Seeing this set off alarm bells in my head. To remove these undocumented methods from our codebase, we could either look for documented alternatives or try to upstream the validator—guess which way I went! There are good reasons to consider this approach. In addition to the undocumented methods, ActiveModel and ActiveRecord already have a number of validators that are generally useful, and immutability fits this broader theme.
00:21:58.680 We already have some validators in our app that are more domain-specific to Kubernetes and those wouldn't be upstreamable to Rails because they aren’t generally useful. Additionally, Rails has a very similar feature named 'attribute read only.' However, it doesn’t work as we want it to; it will prevent attributes from updating in the database but does so silently. You can still update the attribute in memory and call save, but it just doesn’t save the value to the database, which is kind of a footgun and can be hard to understand. Since the concept of read-only attributes exists in Rails, I decided it would be better to rename our validator to 'read-only validator' and put it under the ActiveRecord validations namespace.
00:23:20.520 After renaming it to the read-only validator, we can move the attribute name method to a private method because it shouldn’t be called by other methods. Now that the validator is ready, we can commit it and open a pull request. When we do that, we’ll see a wall of text at the top of the pull request template. For this talk, I’ll skip over it, but you should read it if you make a pull request. The first section to fill out in the template is the motivation for creating a pull request. As the comment mentions, it’s good to reference GitHub issues here, and I like to mention other pull requests or commits for additional context.
00:24:56.700 Let’s write a background for this change. In this case, there aren’t any GitHub issues to link, so I will describe why the existing attribute read-only feature doesn’t quite work. The next section is the description of what the pull request is. A common mistake I’ve seen is that many people will put what files they changed here, but that isn’t really what we want. We want the what and the why for the changes being made. Here, I’ll include a description of what the code does, why it’s better than the existing attribute read-only, and an example of how to use it. I’ll also add a co-authored by line at the bottom for the original author of the code I’m upstreaming, giving them credit for the commit both in GitHub and on the Rails contributor website.
00:26:18.600 As a side note, I prefer including all of this information in the commit message itself. So let’s update that too. We can amend the commit with the updated information. By writing commit messages this way, we prevent future people from needing to click into pull requests to get context. Additionally, because the commit message is written in the format of the pull request template, when I go to open a pull request, I can easily copy everything straight into the template. Returning to the pull request template, there are two more sections we need to consider. We don’t have any additional information to add, so we can leave those blank. The last section to look at is the checklist.
00:27:29.820 Let’s go through each item: is the pull request only related to one change? We don’t want to include refactors or other things that are unrelated to the primary goal of the pull request. In this case, yes—all we are doing is adding the validator. Next, is the commit message detailed? Yes, we amended our initial commit message to include a detailed description of the change and the motivation. Did we add tests? We did not, so let’s add some. We can use the existing immutable validator tests from our app and, with a few tweaks to make them work in the Rails tests, and we obviously have to rename them because we renamed the validator. Boom! Test added.
00:29:05.580 Finally, did we update the changelog for our new feature? We did not. We can add this to the top of the active record changelog and update the commit again. Now that we've checked off everything on the list, we can submit the pull request. And now we wait. Sometimes it can take a bit for reviewers to provide feedback on your pull request. In this case, I was fortunate to receive feedback the same day I submitted it. Unfortunately, it appears that this pull request won’t be accepted as it stands. However, this feedback proved invaluable, as it introduced a distinction between user and developer errors—a consideration I hadn't addressed previously when writing validators. I now reflect on this often while creating validators.
00:30:37.200 Additionally, Rafael offered a suggestion for a different approach, allowing us to achieve the desired behavior in Rails, even if not through this method. What’s encouraging is that someone had tried something similar before. A pull request opened a few years ago aimed to address one of the odd quirks surrounding attribute read-only. This proposal suggested making the attribute assignment do nothing rather than allowing it to work just in memory and failing to save it to the database. This is the proposed approach, which involves considerable code, so let's break it down. This attribute read-only method is a class method called in your model to mark attributes as read-only, functioning similarly to associations.
00:31:53.760 The important takeaway is that calling this method now redefines the read-only attribute setter, so that when you call it, it only updates the attribute when the record is new. We can use the same technique but take it further by having it raise an error if the record is new. This method redefinition method is largely unchanged, but we make the method redefinition optional, ensuring that those using attribute read-only currently won’t experience breaking changes. Another adjustment is that we now raise an error instead of doing nothing when the record is being updated. Finally, we include the has read-only attributes module in the class when it calls attribute read-only.
00:33:36.720 This is necessary because redefining the attribute setter only works for this first type of assignment. None of the other methods go through the attribute setter. The good news is that they all utilize the same underlying method to update attributes called write attribute. By including this module in the class, we can override the write attribute method and raise an error if the attribute is being updated but is deemed read-only. This ensures that all various methods attempting to update an attribute behave consistently. Next, we’ll add some tests for these various ways to assign attributes. Please don't attempt to read all of this; I merely wanted to highlight the number of tests in detail.
00:35:44.880 We’ll also add a changelog entry, including the author of the original pull request we based our changes on. Afterward, we can commit all these updates, writing a detailed commit message that includes our motivation for the change and additional details explaining relevant fixes around the tests. Since there was prior work, we will link to it in the additional information section. In addition to linking the referenced pull requests, we'll add a 'closes' line for our first pull request so that it will be closed if this one gets merged. Once again, we’ll go through the checklist. This pull request only relates to one change; the commit message is detailed; the related issue is linked; and a test is included to prevent regressions.
00:37:43.140 Being a bug fix for an unreleased feature means we actually do not need to include a changelog entry since changelog entries are only for changes made from released versions. Once we submit it, this pull request was accepted, and our monolith could continue with its weekly Rails upgrade. This time, everything is right! Our feature has been upstreamed; we found and fixed a bug, and the new and improved attribute read-only functionality will be available in the next major Rails release. When the time comes, we will be able to delete the immutable validator from our app to reduce our maintenance cost and replace it with the native Rails version. I hope everyone has learned something new and can now confidently say they understand the process of upstreaming their code to Rails. Hopefully, I will see you all on the GitHub repo soon! Thank you! And here’s the link to my slides.