Masataka Kuwabara

Community-driven RBS Repository

RubyKaigi 2024

00:00:09.040 Hello everyone! Today, I'm going to talk about the Community-driven RBS Repository. First, let me introduce myself. I'm Masataka Kuwabara, please call me Po. I'm a software engineer at Manifold and a half-time contributor to RBS. I spend half of my working hours on RBS development and the other half as an engineer. I live in Okayama Prefecture, and as you can see, I have long hair, so you can easily find me at this venue. If you see someone with long hair, feel free to talk to me.
00:00:20.439 Before diving into the main topic, I'd like to share some of my recent work on RBS. Recently, my efforts have focused on making RBS more practical. In 2021, at RubyKaigi, I talked about the RBS collection command. This command is a key feature of RBS in version 3.1 and is crucial for third-party RBS management. It's also related to today's theme. Last year, I discussed the AR subtract command, which is useful for large applications to remove unnecessary definitions from RBS files. Additionally, I am a maintainer of the RBS RA gem. This gem is designed for Ruby developers who wish to use RBS in their applications.
00:01:01.840 In recent months, I have also been working on reducing the memory usage of RBS, as the lang sub-step consumes a significant amount of memory, especially for large applications. For instance, a large application can consume too much memory, making it vital to optimize memory usage when utilizing RBS and step functionalities. Today, I will be discussing the ARAS repository. Recently, I revised the workflow for the Ruby/RBS Collection repository. The main strategy is to grant more privileges to the community to enhance the scalability of this repository. Today, I want to introduce these changes to you.
00:02:30.599 Let's take a look at the agenda. First, I will explain what the RBS collection is, followed by a discussion of the history of this repository from a management perspective. Next, I will address the community-driven solutions, and finally, I will discuss the future of RBS collection. By the way, using RBS's features or static type checking is outside the scope of this talk, so let’s get started!
00:03:10.760 Let me describe the ARAS collection repository. This repository is a GitHub repository that contains AR files for third-party chains, such as R and others. This setup is similar to type definition files for TypeScript. The main concept is the same: it provides RBS files for libraries that do not have native RBS support. This repository is used by the RBS collection command, which allows users to download RBS files from it. Let's illustrate how this repository is used. The first step is to add a gem file to use RBS collection. This gem file is very simple; it only contains the RBS and related gems.
00:04:10.879 Before using RBS collection, you need to run the bundle install command to download the necessary gems. It's crucial to generate the gem file correctly since it will include all expanded dependencies that are required. For example, a Rails gem depends on action cable, action mailbox, and many other gems. After that, I execute the RBS collection install command. This command downloads the required files from the RBS collection repository, determining what gems are necessary from the repository.
00:05:01.320 Now, let's move on to the history of the RBS collection repository. While you may understand what the RBS collection is, you might not be familiar with the history of this repository, especially regarding its management, reviewing, and oversight. The repository has undergone three stages in its development. Initially, all pull requests were reviewed by the core team, primarily by me. After that, we introduced a code owner system that defined the reviewers for each gem. Finally, today's topic includes our recent transition towards a community-driven approach introduced this year.
00:06:00.199 Let's look at each stage in detail. At the beginning of this repository, all pull requests were reviewed by the core team. This was similar to how typical open-source projects are managed on GitHub. However, we encountered problems due to the high reviewing costs, primarily because of the volume of pull requests. The issue was exacerbated by our lack of familiarity with the various gems, as the repository includes many RBS files, requiring significant time to investigate the behavior of these gems to ensure the correctness of RBS.
00:06:53.120 To mitigate the reviewing costs, we introduced a code owner system. This GitHub feature defines reviewers for each directory in the repository, allowing the code owners to review and manage pull requests in their specified sections without core team reviews. While this solution looked promising, we encountered several issues after its implementation. We still needed to review many pull requests because many gems did not have designated code owners, especially the Rails gem, which lacked a code owner. As a result, I still needed to review their cases.
00:08:12.320 Another problem was that code owners could not review their pull requests, which is a limitation of GitHub. Thus, the core team still had to manually approve pull requests from the code owners. Lastly, becoming a code owner required inviting users as outside collaborators, adding another layer of complexity. To address these problems, I introduced a community-driven approach this year. In this approach, I decided to assign more privileges to the community.
00:09:56.480 First, I introduced the 'Gem Reviewer' role instead of a code owner. The purpose of this role is similar to that of the code owner, but it does not rely on GitHub features and is therefore not restricted by GitHub limitations. I also relaxed the restrictions for matching pull requests, allowing contributors to approve pull requests more easily. The promotion process for becoming a gem reviewer is also simpler; they only need to write a file to apply, making it easier than the previous code owner structure.
00:10:58.240 Let's look at an example of how this approach works. When a contributor opens a pull request to this repository, a comment is automatically added to the PR. This comment provides instructions on how to correctly match the pull request. If the pull request meets the criteria, contributors or gem reviewers can match it themselves using the SL match command. Afterward, the action automatically matches the pull request according to the command. This table shows the differences between the gem reviewer and the code owner roles, illustrating the changes in privileges.
00:12:01.199 Before the introduction of the gem reviewer role, contributors and code owners could only match pull requests in certain situations. However, after implementing the Gem Reviewer role, they can match pull requests in all cases. This means the repository functions effectively without the constant oversight of the core team.
00:13:12.400 Now, let's discuss the details of the solutions we implemented. I’d like to share the implementation setup, which is constructed from GitHub Actions. Notably, it does not depend on external servers, avoiding complications during management. The action workflow enables the defined GitHub steps to execute automatically. The workflow files consist of four workflows, with three primary roles. The first is the welcome comment, which triggers when a pull request is opened, providing contributors with guidance on what they should do with their pull requests.
00:14:52.640 The second workflow is triggered when a user makes a review. This command informs the contributors about their status in matching the pull request or if they still need to complete certain tasks. However, these actions are separated into two workflows due to GitHub Actions limitations.
00:15:39.040 The last workflow involves the SL match command. When a contributor uses this command, the GitHub Actions check whether they have the rights to match the pull request. If the pull request meets the necessary conditions, the action will proceed to match the pull request automatically. As for the background and design of this solution, it focuses on balancing contributor experience with maintainability. The key point of this change is the maintainability of the repository, which I believe is essential for scaling. However, I acknowledge that this change can negatively impact contributors since they might receive less information during the review process.
00:18:54.560 This could reduce the knowledge contributors gain during the review process. Additionally, the continuous integration does not sufficiently verify the correctness of RBS. CI mainly performs syntax checks, but it does not address deeper semantic verification. For example, the CI process does not raise concerns when the RBS of a file diverges significantly from the actual implementation of the gem. I am currently accepting this limitation, but I plan to improve these aspects in the future, and I will provide further details at the end of this presentation.
00:20:50.240 Another important issue is security. The new approach raised security concerns since anyone can use the SL match command to match pull requests in the repository. This means that arbitrary code could potentially be introduced. However, we believe that this risk can be safely managed since contributors and users do not execute any code directly from this repository. For instance, the repository does contain a test script that does not execute arbitrary code, ensuring the safety of any users.
00:22:01.080 Now, let’s take a deeper look at the testing script, which employs two techniques to inhibit arbitrary code execution. The first step involves generating a G file and a step file, defined as Ruby DSL files, allowing for any Ruby code. To mitigate risks with these files, the testing script generates them on the fly and avoids executing the bundle install command, which could run any code during the installation.
00:23:45.120 Finally, let's discuss the future of the community-driven approach. I want to highlight two upcoming aspects. First, I will focus on providing more information to RBS contributors about the process of reviewing pull requests, ensuring they can understand the RBS throughout. While I do not have concrete plans, I hope to share tools that adequately check the semantics alongside the source code, helping to avoid breaking changes.
00:24:30.960 For example, many gems rely on each other, and if one gem's RBS undergoes significant changes, it could break a dependent gem. I will look for ways to test these interdependencies within the CI to maintain accuracy. Although I think that such implementations could be complex and increase execution times, they will be beneficial overall.
00:25:00.919 Additionally, there are challenges as many Ruby gems currently do not have corresponding Gem RBS files. As the Ruby gem ecosystem is extensive, I am actively seeking maintainers for RBS related to Ruby gems. If you are interested, please let me know after this presentation, and I will guide you through the process.
00:26:03.720 I aim to grow the community of contributors who can effectively write and enhance RBS. I also see room for improvement in this repository. We need to address inactive gem reviewers since PRs may remain stagnant waiting on approvals if the reviewer is unavailable. I plan to explore a promotion process for reviewers to make this process efficient.
00:27:40.560 Moreover, we need to expand the permissions for handling GitHub issues as the current setup only allows for managing pull requests. I believe that we can leverage GitHub issue templates to improve how issues related to gems are linked.
00:28:05.400 This repository still has some known bugs, such as excessive notifications for reviewing pull requests, and the SL match command also needs to be fixed. In conclusion, today I shared the recent updates on the management strategy for the RBS collection repository.
00:29:10.700 This repository has transitioned into a community-driven model. I explained the background behind this change and it's design. We still need help, especially with Ruby gems, so if you are interested in RBS, please join us!
00:29:51.480 Thank you for listening. That concludes my presentation. We have a few minutes left, so I'll start a demo to open a PR to the repository.
00:30:00.000 Let’s proceed with today's main topic, which is the open PR. I have already prepared a patch method for the patch bank in the repository. I will push the changes upstream now and then follow the link to open a pull request.