Talks

Generating RBIs for dynamic mixins with Sorbet and Tapioca

RubyKaigi 2023

00:00:00.900 Hello RubyKaigi, thanks for virtually having me. My name is Emily Samp and I'm a senior developer at Shopify, where I work on the Ruby developer experience team. You can find all my relevant internet links, including the slides for this talk, on my website emilysamp.dev.
00:00:19.619 Today I'm going to talk to you about some work my team and I did last year, which is generating RBIs for dynamic mix-ins with Sorbet and Tapioca. Now, that’s a lot of words, and not all of them might be familiar to you. If that’s the case, that’s okay! I'm going to break this story down from the very beginning, and by the end of this talk, we'll have learned the following: First, we'll set the stage by learning about Sorbet and RBI files. Then we'll talk about the Tapioca gem, starting with what it is, followed by how it works.
00:00:37.500 Next, we'll dive into how my teammates and I implemented RBI generation for dynamic mix-ins and the challenges we encountered along the way. Finally, we'll take a step back and talk about how this work is actually important to the Ruby language as a whole. Let's start from the very beginning by discussing Sorbet.
00:01:02.699 Sorbet is a gradual type checker for Ruby. It was developed at Stripe, and we've been using it at Shopify since 2019. If you're like me, you love Ruby for its flexibility and expressiveness, which allows, or even encourages, developers to harness the power of meta programming and duck typing. However, this flexibility comes at a cost.
00:01:20.119 Once most applications grow to a certain size, it can become harder to reason about the code at a glance. We may find ourselves spending time figuring out what arguments to pass to a method or what a method will return, rather than solving problems for our users. This is where Sorbet comes in.
00:01:40.560 Sorbet can type check our codebases both statically and at runtime, providing insights about our code that reduce cognitive overhead and allow us to focus on the work that really matters. To understand what’s going on in our codebases, Sorbet uses a few different tools. With static analysis, Sorbet can parse and examine our codebase to make inferences about our code without requiring any additional information. For things that it cannot determine from static analysis alone, Sorbet relies on us, the developers, to provide type annotations.
00:02:10.800 For example, when using Sorbet, we need to write out type signatures for our methods, stating what kind of arguments they take and what objects they'll return. Sorbet can then use that information as part of its static analysis to type check our code more accurately. Moreover, there is some information that Sorbet cannot understand from the first two tools alone, which brings us to Ruby Interface (or RBI) files.
00:02:52.800 The main purpose of RBI files is to provide Sorbet with information it cannot access statically or through type annotations written by developers. One example of this is the code inside of gems.
00:03:03.959 Let’s say I'm developing a gem called CatSay. It's similar to CowSay, where it prints out an ASCII art character of a cat saying whatever message you pass to it. The code for the CatSay gem might look something like this: we define a module called CatSay which implements a method also called cat_say that takes a message as a string. We then use a dark string to print out that message along with the ASCII art of a cat, which is defined in a different method.
00:03:39.239 Now let's say we wanted to use the CatSay gem in a project with Sorbet. We can't just use the gem code in our application without Sorbet throwing a type checking error because it can't access the gem's code statically. Therefore, we have to go to the Sorbet RBI gems folder and create a new RBI file for our CatSay gem. Inside the RBI, we will use typical Ruby syntax to explicitly write out the module and method definitions from our gem so that Sorbet knows about them.
00:04:04.260 RBI files don't just appear out of thin air; they have to be created. And I don't know about you, but manually writing an RBI file for every gem in my Rails application does not sound like a fun time. This is where Tapioca comes in.
00:04:31.560 Tapioca is a gem that generates RBIs for use with Sorbet. It was developed at Shopify, and in 2022, it became the official companion gem to Sorbet with the goal of making it easier for Ruby developers to adopt and maintain Sorbet in their applications. Now, Tapioca generates RBIs for gems in a few interesting ways; it can pull pre-written RBI files from other repositories and provides a framework for developers to write RBI generators for DSLs. However, during this talk, we will focus on just one of Tapioca's features.
00:04:54.360 We will talk about using Tapioca to dynamically generate RBI files for gems. Before we move on, let's recap what we've learned so far.
00:05:06.720 First, we learned that Sorbet is a gradual type checker for Ruby and uses RBI files to gain information about parts of our code that it cannot statically analyze, such as code inside of gems. Tapioca is a companion gem to Sorbet that dynamically generates RBI files for gems in our projects.
00:05:27.120 Now that we have that important information under our belts, we can start to understand how Tapioca works on a technical level. In this next section, we’ll go through how Tapioca generates gem RBIs.
00:05:40.039 To generate an RBI for a gem, we use Tapioca's gem command on the command line. This command will give us some output, telling us it is required to compile all the gems in the project. 'Compiling' is the word that Tapioca uses to mean generating RBIs. After some more information, it will finally notify us that it has compiled to the RBI for the CatSay gem and created a new RBI file.
00:06:03.060 This will generate the RBI file we saw earlier in the talk, which will contain the CatSay module and a method definition for the cat_say method. But how can Tapioca generate this file automatically? Let's take a deeper look at the process and try to understand what Tapioca is actually doing during gem RBI generation.
00:06:20.520 In the Tapioca codebase, the process of generating RBI is called the 'pipeline.' This pipeline is made up of several parts. First up is the queue; the queue will contain parts of the gem's code that need to be processed into RBI, and we'll talk more about what that means shortly.
00:06:45.600 Items in the queue will be processed on a first-in, first-out basis by the pipeline. The pipeline is responsible for incrementally generating the RBI file and adding new items to the queue. Finally, Tapioca will create an internal representation of the RBI file for the gem. This is where the RBI code will be kept in memory until Tapioca is ready to write the file to disk.
00:07:09.419 Let’s examine these components one by one and understand how they work. First up is the queue. Earlier, I mentioned that items in the queue represent parts of the gem's code. Broadly speaking, these items correspond to constants—Ruby classes and modules defined in the gem. In our CatSay example, we only have one constant so far, which is the CatSay module. However, in most gems, there will be more than one constant. To populate the queue, Tapioca actually uses Sorbet under the hood.
00:07:45.480 As we've already discussed, Sorbet is capable of statically analyzing a Ruby codebase and providing information about it. By running Sorbet on the gem's codebase and passing the symbol table JSON argument, Tapioca can acquire a list of all constants statically defined in the gem's code in the format of a JSON object. Tapioca can then parse that JSON object, identify all the constants, and add them to the queue.
00:08:11.280 Before we move on, I want to be transparent that the process I described is an oversimplification. There's actually more to this than I'm discussing right now, but since that’s not the main purpose of this talk, I’ll leave it at this level. If you ever want to discuss it further, please feel free to reach out through any of my social media links on my website.
00:08:35.880 Now that the queue is populated with constants, we can look at how those constants are processed in the pipeline and used to build the RBI file.
00:08:48.540 The pipeline is composed of a series of smaller components called listeners. Each listener has a specific responsibility when it comes to generating RBI to write to the RBI file. For example, there is one listener called the method listener. This listener will find all methods defined on the constant and then generate corresponding code to write to the RBI file, informing Sorbet about those methods.
00:09:06.899 Each listener takes a turn processing the constant. Not all constants will be relevant to all listeners, in which case nothing happens, and the constant is passed along to the next listener until it reaches the end of the list. The entire process repeats itself until the queue runs out of constants. At that point, the RBI file is considered complete and is written to disk.
00:09:44.880 Now, this may seem straightforward, but as you can imagine, I've abstracted away some complexity of the system. Now that we understand how Tapioca's gem RBI generation pipeline works, let’s learn about some of the complexities that can arise during this process.
00:10:03.240 Earlier, I mentioned that the queue will be populated with constants from the gem. However, I neglected to mention a really important point: not all constants used inside a gem are originally defined in that gem. In fact, most gems have dependencies in the form of other gems. Let’s return to my CatSay gem example. Let's say that while developing the gem, I discover another gem called ASCII art, whose sole purpose is to create ASCII art strings.
00:10:35.940 I require that gem inside my own gem and then use a method from a module defined in the ASCII art gem. The next time Tapioca uses Sorbet to discover constants in the gem and populate the queue, it’s going to discover the ASCII art constant and add it to the queue along with the constants defined in my gem.
00:10:56.160 If we try to process this constant through the pipeline in the same way as we process others, it could cause problems in our resulting RBI file. If we look at the RBI file for the CatSay gem, it would not only contain information about the CatSay module and the code inside it, but it would also include information about the ASCII art module. While this isn’t harmful, it’s essential to remember that Tapioca will generate a separate RBI file for the ASCII art gem.
00:11:38.880 If we open that file instead, it will also have information about the ASCII art module. Having two sources of truth for the ASCII art RBI could cause Sorbet errors and confusion for developers. To address this, Tapioca implements a check in the RBI generation pipeline.
00:12:02.400 Before any constants in the queue are processed, they are checked to see if they are defined in the gem currently being processed. Tapioca does this using the const_source_location method defined on the Object class, taking the stringified name of the constant being processed. This method will return the filename and line number where the constant was originally defined. Tapioca then uses this information to determine if the path is located within the gem that the pipeline is currently processing.
00:12:22.140 If the constant is defined in the gem being processed, Tapioca will send it through the pipeline; otherwise, the constant is discarded. To summarize what we just discussed, we took a deep dive into how Tapioca generates gem RBIs by passing the gem's constants through a pipeline so they can be processed by listeners who are responsible for incrementally generating the RBI.
00:12:46.800 We also covered how the pipeline discards constants that were not originally defined in the gem being processed. Now we're finally set up to explore the main topic of this talk, which is generating RBIs for dynamic mix-ins.
00:13:17.520 When I joined Shopify in the spring of 2022, this was one of the few cases that Tapioca couldn't quite handle yet. Let’s revisit my CatSay gem once again. I've decided that it isn't enough for my cat_say method to be encapsulated inside the CatSay module; I want my method to be exposed at the Ruby root level. To accomplish this, I include my CatSay module in Ruby's Object class. This is an example of what I'm calling a dynamic mix-in.
00:13:46.500 For the remainder of this talk, I'll use the phrase 'dynamic mix-in' to refer to any calls to the prepend, include, or extend methods that occur outside of a class or module definition, as I just showed in the CatSay gem. Because I included the CatSay module in Ruby's Object class, my gem can now call the cat_say method from the root level, allowing users of the gem to print adorable cats wherever they wish.
00:14:12.300 However, using the dynamic mix-in will present a problem in any project that uses Sorbet. If we go back to the RBI file for the CatSay gem, we'll see that it won't contain any information about the Object class; the Object class was not defined within the CatSay gem, which means that the Tapioca pipeline will discard the Object constant before any RBI can be generated for it.
00:14:43.500 This becomes problematic if we try to use the CatSay gem in our application. Let's say we want to write a method in our application called greet_as_cat that prints a cat saying 'Hello, world.' If we try to call the cat_say method on the root scope, Sorbet will complain, stating 'method cat_say does not exist on my class.' This is a case where Tapioca could cause Sorbet to create a type checking error on a valid piece of code, and we needed to devise a solution.
00:15:04.500 Previously, we talked about the Tapioca gem and its RBI generation pipeline, mentioning a step that checks if each constant was originally defined in the gem. My team and I pondered whether we could perform a similar check for mix-ins. What if we could determine whether a dynamic mix-in occurred in the gem that is currently being processed?
00:15:44.100 If so, we can generate RBI for it even if it involves a constant that was not originally defined in the gem itself. Of course, Ruby doesn’t have a convenient method for accessing this information, so to implement a check like this, we need to track some information about dynamic mix-ins in our gem.
00:16:14.400 For every mix-in, we need to know what constant is receiving the mix-in, which constant is being mixed in, what type of mix-in is being performed (whether it's prepend, include, or extend), and the location of the mix-in. It's important to remember that because mix-ins are dynamic, statically analyzing the codebase will not suffice; we need to load the gem to collect this information at runtime.
00:16:43.320 To achieve this, we introduced a new singleton object in Tapioca called the mix-in tracker. The mix-in tracker keeps a reference to a hash called mixins_to_constants. This hash will store all the information we previously discussed. The hash is populated by a method called register, which takes three arguments: the constant receiving the mix-in, the constant being mixed in, and the type of mix-in.
00:17:09.960 In the method body, we find the mix-in location, and we will return to how we do that shortly, and then we use this information to populate the mixins_to_constants hash. Notably, to populate the hash with information, Tapioca actually employs a dynamic mix-in of its own.
00:17:30.960 Further down the file, in the mix-in tracker, Tapioca reopens the Module class and prepends an anonymous module to it. Inside this anonymous module, it implements an override for several methods, including a method called append_features. Within this method override, Tapioca calls the register method from the mix-in tracker, which populates the mix-in tracker with information.
00:17:50.760 To this method, it passes the mix-in self (the constant receiving the mix-in) and the mix-in type, which, in this case, is include. You may wonder why we didn't override the include, prepend, and extend methods directly. The answer lies in the fact that overriding these methods is a common pattern, which can pose issues for Tapioca.
00:18:15.420 Imagine if we decided to override the include, prepend, and extend methods in Tapioca, and a developer using Tapioca also employs another gem that performs the same overrides. Depending on the load order of the gems, it may happen that the other gem gets placed earlier in the ancestor chain, which leads to its methods being called first. If the other gem's developers forget to call super in their override implementation, this means that Tapioca's override will never be called, and information about dynamic mix-ins will never be added to the mix-in tracker.
00:18:55.380 Tapioca's method override would be skipped entirely, resulting in incorrect RBIs being generated, which could cause hard-to-debug errors in Sorbet. By overriding more internal methods, we cannot guarantee that this will never happen, but we lessen our chances in multiple ways. First, it’s less likely that other gems override these methods since they're not part of Ruby's public API. Second, even if other gems override include, prepend, and extend, as long as their implementations ultimately call the internal methods, Tapioca's mix-in tracker will still be populated, and Tapioca will continue to function.
00:19:20.520 To make this more concrete, let's revisit our example from the CatSay gem. In order to generate the RBI, Tapioca would load the CatSay gem, which triggers the line where we call Object.include and pass the CatSay module. This call to include will invoke the append_features method, which we have overwritten in Tapioca. Within that method, we will call Tapioca's register method and pass in the following arguments: we pass in the Object class as the constant receiving the mix-in, the CatSay module as the constant being mixed in, and finally, we specify the mix-in type as an include.
00:19:54.480 Once these values are sent to the register method, we need to find the location of the mix-in. At a high level, we can do this using the caller_locations method defined on Ruby's Kernel module.
00:20:21.120 Caller_locations is a method that generates a backtrace as an array of thread backtrace location objects. This array will display the path for every piece of code that was executed to reach the method that performs the dynamic mix-in.
00:20:38.460 You can see here that our backtrace starts with the most recent file run, which is Tapioca's mix-in tracker, where we've implemented the override for the append_features method. From there, it tracks back to the previous call, which occurred in the CatSay gem where we called include to mix in the CatSay module. This list continues down the stack trace.
00:20:56.700 Now, after we have this array of locations, we need to ascertain which of these represents the true location of the dynamic mix-in. One potential approach would be to check the location of the mix-in method call; in this case, the include method is being run from the CatSay gem, which means we could attribute the dynamic mix-in to the CatSay gem.
00:21:35.760 Unfortunately, this is not always that straightforward. Let’s examine a more complicated example: the dynamic mix-in within Action Controller’s helper method pattern. For instance, within a Rails app, we could create a controller and write a useful method there. The class ActionController::Base includes a method called helper_method.
00:22:02.040 Using this method, we can mark methods in our Rails controllers as helpers, meaning they will be available in both the view and the controller. Behind the scenes, helper_method dynamically creates a new module called HelperMethods under the namespace of the controller in our Rails application, which dynamically includes ActionController::Base's helper methods into this new module.
00:22:27.840 This dynamic include occurs within the Rails codebase, specifically within Action Pack, but it modifies code that exists within our application namespace, not within the Rails namespace. Therefore, applying the approach we discussed earlier to find the mix-in location will indicate that the mix-in occurred inside the Action Pack gem.
00:23:01.320 In this case, any RBI for the mix-in should be associated with Action Pack, but this doesn’t logically make sense because the actual functionality changes occur in our Rails application.
00:23:27.540 To work around this, we needed to devise a new approach. My colleague, Ufu Kazroliolu, came up with a clever solution. If we carefully examine the array of locations generated by the caller_locations method, we will see that one of the locations has a tag in brackets indicating 'top required.' This tag marks the location where another gem—like the Ruby core library—is required into the gem currently being processed by Tapioca.
00:23:53.640 By using this location, we can determine which gem required the other one and therefore identify which gem is the one actually responsible for the dynamic mix-in. This method even works for Action Controller's helper method.
00:24:23.640 In Tapioca, this behavior is encapsulated in a method called resolve_loc, which takes the result of caller_locations and finds which of them has the top required tag, returning that as the specific location of the mix-in.
00:25:04.800 With the issue of determining the mixing location resolved, our mix-in tracker is now complete. The mix-in tracker can now take a constant, search the mixins_to_constants table, and return any other constants that use that first constant as a mix-in. Additionally, the mix-in tracker knows what kind of mixing took place, alongside the location of the mix-in.
00:25:28.620 So, let's see how this is utilized in the Tapioca gem's RBI generation pipeline.
00:25:46.080 The first step towards completing our implementation was to create a new listener called the foreign constants listener. This listener will use the mix-in tracker to determine whether we should process a foreign constant, otherwise known as a constant that wasn't originally defined in the gem.
00:26:10.560 The job of the listener is to take a constant from the pipeline and then consult the mix-in tracker to assess whether this constant is involved in any dynamic mix-ins. The mix-in tracker will find all dynamic mix-ins associated with the constant and return them to the listener before allowing the constant to continue down the pipeline.
00:26:38.760 From there, the listener will check if the mixing occurred within the gem. If so, Tapioca will allow the constant found by the mix-in tracker to bypass the checks, and it will generate RBI for it.
00:26:56.460 Even though the constant wasn’t originally defined within the gem, this process will also cause most listeners to skip over the constant entirely, except for one listener called the mix-in listener, which will generate RBI for the dynamic mix-in that occurs with that constant.
00:27:18.000 Once this process is complete, the CatSay RBI will have information about the dynamic mix-in that occurs on the Object class. This will allow us to call the cat_say method from the root scope anywhere in our application without Sorbet throwing type checking errors. Thus, we've successfully used Tapioca to generate gem RBIs for dynamic mix-ins.
00:27:39.840 That was a lot of information! So, let’s review what we just discussed. Tapioca generates RBIs for dynamic mix-ins by keeping track of where mix-ins happen in a singleton object called the mix-in tracker. The mix-in tracker is populated by overriding append_features, prepend_features, and extend objects to register their information. Once the mix-in tracker has information about every mix-in, Tapioca uses that information in the pipeline to generate RBI properly for the mix-ins that occur within the gem.
00:28:17.180 We're almost done with this talk, but before I leave you, I wanted to step back and explain why you should consider this work important, even if you personally don’t use Sorbet or Tapioca.
00:28:40.680 Ultimately, Tapioca is built on Ruby APIs that help us introspect and understand other Ruby code. None of this work would be possible without Ruby's powerful ability to understand and modify other Ruby code at runtime, and that's really exciting! Because of this, the work we do on Tapioca helps us discover the limits of using Ruby for these purposes.
00:29:05.040 A fantastic illustration of this is finding the location of a dynamic mix-in. Currently, Tapioca parses the responses of caller_locations to find the top required tag within the backtrace. What if there were a method on the thread backtrace location object that indicated whether a certain location was required? This would eliminate the need for brittle string parsing.
00:29:20.220 In turn, this could simplify understanding Ruby code through the call stack. That is ultimately why I find working on Tapioca so rewarding—it aids the entire community in discovering ways to make Ruby a more powerful language.
00:29:41.760 With this knowledge, we can develop better tools that help us understand our code and streamline our workflows with tools like Sorbet and Tapioca.
00:30:06.000 Before concluding this talk, I want to acknowledge a few things. First, I would like to thank all my exceptional teammates on the Ruby developer experience team at Shopify for helping me deliver my best work every day and for providing feedback on this presentation.
00:30:26.880 I would especially like to thank Alexander Terrazas and Ufu Kazroliolu for their work and mentorship throughout this project. Several of my teammates are giving talks during this conference, and I highly encourage you to check them out if you see them on the schedule.
00:30:44.760 Finally, I want to mention something that is very close to my heart, which is wnb.rb. I co-founded wnb.rb in 2021, and it is now the largest community of women and non-binary Ruby developers in the world, with almost a thousand members globally.
00:31:06.120 If you identify as a woman or a non-binary individual, I encourage you to join us. You can find more information on our website at wnb.rb.dev. Thank you so much for attending my talk today at RubyKaigi. I'll leave you with a summary of what we discussed.
00:31:30.480 Enjoy the rest of the conference, and I hope to have an opportunity to meet you in person in the future. Bye-bye!