00:00:06.879
Good morning, Blue Ridge Ruby! How did everyone enjoy the first day of the conference?
00:00:15.359
Yesterday, I also had a great time, and I'm so excited to be here kicking off day two with you! This morning, I'm going to talk to you about something called type checking.
00:00:23.400
Now, if you've never heard of type checking, that's okay. We're going to start from the beginning and make sure everyone's on the same page going into this talk.
00:00:29.080
If you have heard of type checking, though, you may be wondering how this talk could possibly be relevant to you. Not a lot of Rubyists use type checking in their day-to-day work. Maybe you're someone who thinks they will never use type checking in Ruby. Maybe you're secretly DHH wearing sunglasses and a fake mustache, sneaking into regional Ruby conferences. Look, I don't know; that's your business. I don't care. But I am here to tell you that this talk is for you, no matter who you are.
00:01:01.800
And to do that, I'm going to let you in on a little secret. First, get out of here, David! The secret is that this talk is not actually about type checking. This is actually a talk about something called reflection. Reflection is just a fancy word for using Ruby to understand other Ruby code.
00:01:15.880
Using reflection, we can build really cool tools that help developers save time, improve their workflows, and find more joy in coding.
00:01:36.680
My name is Emily Samp. I am a software developer at Shopify, where I work with tools like Sorbet and Tapioca to improve the Ruby developer experience.
00:01:42.600
You can find all my internet links on my website, emilys.dev, including all of the slides for this talk that I'll upload later today. There's also a dog in the back, and so if I get distracted, that's why.
00:02:01.479
My biggest hope for you today is that, yes, you do learn some things about type checking. But more importantly, I hope you leave this talk feeling like you too could use Ruby reflection to build something really, really cool. Does that sound good? Okay, let's get started!
00:02:20.000
To talk about how I use reflection in my day-to-day work, I first have to explain something called Sorbet. Sorbet is a gradual type checker for Ruby, and it was developed at Stripe.
00:02:31.560
We've been using it at Shopify since 2019. When I say Sorbet is a type checker, what I mean is that it can take all of our code, and instead of having to run it to tell us when we've made a mistake, it will analyze it statically before it ever gets run and tell us if we are using Ruby in a valid way.
00:02:42.360
This can really improve our workflows, and I'll show you what I mean.
00:02:52.879
So let's say we make a new Ruby project, and in that project we create a service called Greet Service. Greet Service is just going to be a module that implements various types of greetings. The first greeting we're going to put in this service is called Basic Greet.
00:03:04.519
It takes a name and prints out hello with that name and an exclamation point. One way we can use Sorbet to help type check this code is by adding a method signature to the method. We add a SIG block that comes from Sorbet, and we tell it there's going to be one parameter in the method. It's called name, and it's a string.
00:03:17.799
The method then has a void return type, which means we don't really care what the return type is; we're just printing out a message. Now, because we wrote this method signature, Sorbet can help us avoid doing silly things with this method.
00:03:35.840
For example, let's imagine in the future we try to call this method and instead of a string, we pass in nil. Sorbet is going to raise a type checking error, and it's going to say that it expected a string but we're giving it nil class, meaning we told it the argument was going to be a string and then we gave it something else instead.
00:03:54.439
So rather than having to run the tests for our code or having to push to CI, Sorbet can tell us before we do any of that that we've made a mistake.
00:04:12.280
It can even do this in more subtle ways. Let's say further down the line we want to refactor this method. We want to add a check to see if the name argument is going to be nil. In Ruby, we do this all the time; we're always worried that things are going to be nil.
00:04:23.479
But Sorbet is going to stop us from doing this too. It's going to say that our code is unreachable. Why? Because if the argument is a string—which Sorbet will enforce—then it can never be nil. Therefore, what we've written is code that will never run, and Sorbet stops us from cluttering up our applications and ultimately confusing ourselves and making our code harder to reason about down the line.
00:04:40.000
So that's really useful. But if you've ever worked on a Ruby application, you'll know that it's usually not this simple. One way that applications are a bit more complicated is that we usually have dependencies in the form of gems. So let's see how Sorbet can handle an application with a gem in it.
00:05:02.479
For my Greet Service, I'm going to create a new gem called CatSay. CatSay is similar to CowSay, where it will print out an ASCII art picture of a cat saying whatever message you pass into it. As we can all agree, I think this is a very important development in the Ruby language.
00:05:21.319
The CatSay gem; the code for the gem might look something like this: we have a module called CatSay, and inside we define a method also called say that takes a message and prints out that message along with an ASCII art picture of a cat.
00:05:40.800
If we go to use our CatSay gem in the Greet Service, we require the gem at the top of the file, then we define a new greeting called CatGreet. In that method, we call the say method from our gem.
00:05:57.639
However, we're going to encounter a problem if we're also using Sorbet. Sorbet is going to raise an error here, saying that it can't find the module CatSay. But why? This is totally valid Ruby code. I even checked that I required the gem at the top of the file, and we all know that we have forgotten to do that many times.
00:06:16.000
So the reason that Sorbet doesn't know about it is that Sorbet can only analyze the code that it finds within the bounds of our application. Because the gem is defined outside of our application, there's no way for Sorbet to know about it right out of the box.
00:06:34.599
So we developers have to tell Sorbet what code is available in our gem, and we do this using something called an RBI or Ruby Interface file.
00:06:54.080
What we do is create a set of nested folders, sorbet/rbi/gems, and inside, we create a cat say.rbi file. Inside this file is nothing fancy; it's just Ruby code, and we redefine the module CatSay along with the method defined in that module.
00:07:12.240
We don't have to put in any of the method body; we're just telling Sorbet, "Hey, in this gem, we have a module and a method available." Sorbet knows to check all of the files in this folder path, and so it can check what code is available in all of the gems that we're using in the project.
00:07:23.039
Problem solved! Except not really, because if you've ever written a Ruby project, you'll know that you don't just use one gem; you use many gems—sometimes many, many gems—and those gems also use gems.
00:07:34.479
If you want to use Sorbet, that means you have to write an RBI file for every single one of those gems, and I don't know about you, but I do not have time for that!
00:07:53.759
So this is where reflection comes in, and in this case, the reflection is going to be packaged up in a gem called Tapioca. Tapioca is a gem that generates RBI files for Sorbet. It was developed by my team at Shopify, and last year it became the official companion gem to Sorbet, making it easier to adopt and maintain Sorbet in Ruby projects.
00:08:06.039
Tapioca does a lot of cool stuff and has a lot of awesome features, but I don't have time to talk about all of that today, even if I would like to. Instead, we're going to focus on just one thing: we're going to talk about the fact that Tapioca uses reflection to dynamically generate RBI files for gems.
00:08:25.800
Let me show you what I mean. To generate an RBI file for our CatSay gem, we could go to the command line and run bin/tapioca gem catsay. There's going to be a bunch of output, but at the end, Tapioca is going to tell us it compiled the RBI for the CatSay gem. That just means it generated the RBI file on its own.
00:08:37.719
If we go back to that same file path we talked about earlier, you'll see that there's the RBI file that we wrote, but this time we didn't have to write it. Tapioca wrote it for us! It saved us time.
00:08:56.480
I'm going to spend the rest of the talk discussing how Tapioca achieves this and how it uses Ruby reflection along the way. So we're going to cover three things. First, we'll talk about the architecture of Tapioca. We'll talk about the fundamentals of Ruby reflection that Tapioca uses to generate RBI files, and we're also going to talk about how Tapioca combines some of those fundamentals to create new reflection techniques of its own.
00:09:14.760
So let's start with the architecture of Tapioca. Tapioca generates gem RBIs in a process that we call the pipeline. The pipeline is made up of a few parts. First up is the queue. The queue holds parts of the gem's code that need to be turned into RBI, and I'll explain more about what that means in a second.
00:09:36.000
Then we have the pipeline. The pipeline is going to take items from the queue and actually create the RBI out of them, and Tapioca also keeps a version of the RBI file in memory that it adds to incrementally until the file is ready and it's written to disk.
00:09:52.800
So we're going to examine these components one by one to understand how they work, starting with the queue. As I said, items in the queue are going to represent parts of the gem's code. To be specific about it, they're going to represent constants, meaning Ruby classes and modules. In the case of the CatSay gem, we only have one constant so far—the CatSay module—but there will be more.
00:10:16.360
Tapioca populates these constants into the queue using a couple of steps. First, what it does is load all of the files in the gem. For example, in the CatSay gem, it's going to load this file that we saw earlier. When it loads the file, it runs the top-level Ruby code and populates Ruby's memory space with information about the code that it finds.
00:10:36.720
When it loads this line, the CatSay module, Ruby's memory is going to be populated with an object representing the module CatSay. This object is going to contain information about the module, such as what methods it defines—like the CatSay method—and we're going to come back to that in a second.
00:11:05.679
The second step is that it's going to find all of the constants in the gem in order to populate them on the queue. So to do this, Tapioca is going to use our friend Sorbet. Remember, we said Sorbet is capable of statically analyzing our code so it doesn't have to run our code to know things about it.
00:11:26.760
We're going to use this technique in order to find all of the constants in the gem really quickly. We can run Sorbet using the option print symbol table json; that's going to give us a JSON object with a list of all of the constants defined in the gem. Then, Tapioca can parse this object, find the constants, and put them on the queue.
00:11:51.120
In the process, it's going to do something called constantizing. Constantizing just means that it takes a symbol or a string representing the constant and finds the corresponding Ruby object in memory.
00:12:10.440
So remember, we populated Ruby memory in the first step; we found the strings in the second step; we can combine them using constantizing. In Tapioca, the process of constantizing looks something like this: it uses Ruby's const_get method. This is a method that takes a string that's the name of a constant and finds the corresponding Ruby object in memory.
00:12:30.320
If you're a Rails developer, you'll be very familiar with this method; probably because Rails implements a very similar method in pretty much the exact same way. This is also our first example of Ruby reflection.
00:12:55.440
In the next section of this talk, we are going to discover some tools that Tapioca uses in order to generate the gem's RBI, using Ruby reflection. This is not meant to be exhaustive in any way because Ruby has a lot of awesome reflection techniques that we would spend all day talking about; it's just meant to give you a taste of what is possible if you want to write code that examines other Ruby code.
00:13:10.880
So back to our queue. It's populated with constants, and those constants are going to be passed to the pipeline. Now, the pipeline is actually composed of a series of components that are called listeners. Each listener has a specific responsibility.
00:13:32.720
The constants are going to be passed to every single one of the listeners, and the listeners are going to take turns generating snippets of the RBI file that are written to the larger file and kind of pulled together to create the entire file that we use to type check our code.
00:13:53.160
The constant is going to be passed to every listener until the pipeline is complete. Now this is a super vague description, and so let's dig into what these listeners do by examining one of them in depth.
00:14:11.320
This is going to be the methods listener. The methods listener takes a constant from the pipeline and then generates a part of the RBI file that corresponds to the methods defined on that constant. So how does the listener know what methods are defined on the constant? Well, it uses reflection!
00:14:25.920
I'm sure that's very shocking to you. If we look at this snippet of code, we first have to get the name of the method from the constant, and the way Tapioca does this is by calling the Ruby method public_instance_methods. I'm going to say method a bunch on this slide, so sorry if that word loses meaning for you after a couple of minutes.
00:14:40.240
The public_instance_methods method returns a list of the names of all of the methods defined on the constant. Once we have the names, we can then use them to get the arguments for those methods. We start out by calling the instance_method method and then passing in the name of the method that we discovered in the previous step.
00:15:05.440
This gives us an object that represents the method from our Ruby code. It's literally a Ruby object that represents Ruby code—that is so cool! On that object, you can call the parameters method, which will give you a list of the names of all the parameters.
00:15:19.199
Then, using all of this, Tapioca can create this line of code that it then writes to the RBI file. We've also just discovered three more fundamental reflection techniques!
00:15:36.040
Now, these are not the only ones Tapioca uses; there are actually 14 total listeners in the Tapioca gem's RBI generation pipeline. Each of these listeners uses different reflection techniques to generate different parts of the RBI code.
00:15:58.680
I think it's so cool that Ruby is so powerful and gives us so many tools that we are able to build something like Tapioca, but I'll stop fangirling. Let's go back to the pipeline.
00:16:18.120
This whole process is going to repeat itself for every single constant in the pipeline. They're going to take turns being passed to every single one of the listeners. The RBI file is going to be generated incrementally until the queue is empty, and then the RBI file can be written to disk.
00:16:37.320
I made this slide really long on purpose so I can take a drink of water.
00:16:44.360
Cool! So another thing to keep in mind here is that this isn't just happening once; Tapioca is going to generate an RBI file for every gem in a project with Sorbet.
00:17:01.480
You can think of it as Tapioca running multiple pipelines in parallel in order to generate multiple RBI files as quickly as it possibly can.
00:17:17.200
This is going to be important in a second as we dive into some of the things that make Tapioca gem RBI generation more complex.
00:17:32.880
If we go back to the queue, earlier I said that the queue is populated with constants from the gem, but we agreed at the beginning of the talk that most Ruby projects use dependencies in the form of gems, and that gems use other gems as dependencies as well.
00:17:52.720
A lot of the constants that you find in a gem are not going to have been originally from that gem. So let's look at an example of this. If we go back to CatSay, we can imagine that I found a different gem called the AsciiArt gem, and I want to use it in my CatSay gem.
00:18:09.600
So in the gem, I call ASCIIArt.cat. When Tapioca goes to process my gem again, it's going to discover the existence of this ASCIIArt constant.
00:18:24.960
It's going to take this constant and it's going to try to add it to the queue along with the CatSay constant. If we try to process this constant through the pipeline, it's going to create a problem.
00:18:37.880
The problem is that in the resulting RBI file, we're not only going to have information about the CatSay gem or the CatSay constant, but we're also going to have information about ASCIIArt on its own. This is fine, but remember, Tapioca is not just generating RBI for CatSay; it's generating RBI for AsciiArt as well.
00:18:56.559
In that file, it's also going to have information about the ASCIIArt module. Now we have two sources of truth for the ASCIIArt RBI. That means if we go and update one but not the other, it can create very hard-to-debug Sorbet errors, not to mention confusing all of the other developers on our team which is not a great experience.
00:19:11.799
So, Tapioca actually has to handle this case. Before any of the constants are processed in the pipeline, there's actually a check to see whether the constant is originally defined in the gem.
00:19:27.520
Tapioca implements this check using a method called const_source_location. This method basically returns the file name and line number where a constant was originally defined.
00:19:45.880
Tapioca can then check that file name to see if it belongs to the gem that's currently being processed. If the constant was originally defined in the gem, then that's great; it can be processed in the pipeline. But if not, then Tapioca will discard it from the pipeline entirely.
00:19:58.080
This rounds out our last set of Ruby reflection techniques. As we talked about, Tapioca only generates RBI that is relevant to a particular gem. Once again, this isn't an exhaustive list; this is just a few of the ways that Tapioca uses reflection.
00:20:14.240
However, sometimes Ruby's available reflection techniques are not enough, and we have run into a few of these instances in Tapioca.
00:20:33.920
The rest of this talk is going to be about how my teammates and I had to start combining some of these techniques in order to implement crucial features in the Tapioca gem.
00:20:52.000
I'd like to call this section extreme Ruby reflection.
00:21:09.200
To understand my story, we're going to go back to the CatSay gem one more time.
00:21:18.640
Let's say that I want to change the functionality of the gem. Rather than having to include the CatSay module, I want to be able to call the catsay method from anywhere in my Ruby code.
00:21:37.440
In Ruby, this is actually pretty easy to do. We can just add a line to our gem that says Object.include CatSay. What we're doing here is taking all of the methods from the CatSay module and making them available on Ruby's Object class.
00:21:53.440
All classes and methods are descendants of Object in Ruby, and so that means we can call the methods from CatSay on any object, which is basically anywhere in Ruby code.
00:22:04.040
If you've seen an include, historically, it's probably looked more like this: usually in Ruby when we call include or prepend or extend, we do it inside of a class or module definition.
00:22:19.400
But because this is Ruby, we don't have to do that! We can do includes and prepends and extends anywhere we want, and I'd like to call that a dynamic mix-in.
00:22:34.000
This means calling the include method or prepend or extend, basically mixing some code into other code outside of the original class or module definition.
00:22:47.720
This is a really common pattern in Ruby. A lot of gem maintainers do this because it makes it really easy to use their gems instead of having to do all the gem setup ourselves. The methods from the gem become available on existing Ruby objects, and so we can just plug and play.
00:23:05.680
Rails does this a lot, and it's cool that Ruby allows us to do this! It creates a really good quality of life for us developers.
00:23:20.320
That means that Tapioca, unfortunately, has to support it, and that's going to be tricky!
00:23:36.400
I'll show you why. If we go back to the CatSay RBI file, in an ideal world, we'd want Tapioca to generate something like this: we'd want Tapioca to tell Sorbet that all of the methods from the CatSay object are available in the Object class through this include.
00:23:53.200
This would allow us to use methods from the CatSay gem without having to include the module first, so we can delete that line that included the module.
00:24:12.600
However, it doesn't work out this way. As we remember, the Tapioca pipeline has a check. If we try to generate a part of the RBI file for the Object class, Tapioca is going to check and say that Object was not originally defined in the CatSay gem. Object is from Ruby, and we don't want to process it.
00:24:29.360
So the constant would be discarded by the pipeline, and this crucial part of the RBI file would not be generated. Sorbet would raise an error saying our CatSay method is not available even though it is.
00:24:47.720
This is a case where Tapioca would be causing Sorbet to raise incorrect type-checking errors, and that's really not great.
00:25:00.560
So we had to figure out a way to fix it. We knew that the issue was this check, and so we had to figure out is there a way we could make an exception? Could we see whether a constant has a dynamic mix-in happen within the gem, and if so, bypass the check entirely?
00:25:14.480
To build this feature, we knew that Tapioca would need to know a couple of things. First, they would need to know which constants have dynamic mix-ins—and yes, I know I'm showing it—we need to know that.
00:25:31.920
Then, we need to know where in the gem those mix-ins happen to determine whether they happen in the gem or somewhere else outside.
00:25:44.040
In order to accomplish this, we'll go back to our idea of populating Ruby memory. We're going to create a new object called the mixin tracker, and you can think of this as a kind of database that keeps track of all of the mix-ins that happen within a gem, like our mix-in between Object and CatSay.
00:26:04.480
The mixin tracker would contain references to the constants involved in the mix-in, and so if we had this object in memory, it would be easy for us to check during the pipeline process whether a constant has a mix-in happen within the gem.
00:26:26.440
So, in order to accomplish this, Tapioca actually implements a dynamic mix-in of its own! First, it reopens Ruby's module class and prepends a dynamic prepend module to it.
00:26:45.520
Inside of that prepend, it overrides the append_features method, which is just an internal method within the include method. You can think of this as us adding new functionality to the include method.
00:27:02.560
Inside of that method, we take the mixin tracker, that object that we saw in memory, and we register information about the mix-in to it. So we are populating Ruby memory with information about every single mix-in that happens while the gem is loaded.
00:27:26.560
In the mixin tracker itself, there is a hash called mixin_to_constants, which keeps track of all this information as well as a register method, which is the method we saw called on the previous slide that actually stores this information.
00:27:42.000
It's also in charge of finding the location of a dynamic mix-in, and to do this, it's based on Ruby's caller_locations method. This method returns a backtrace anywhere you are in Ruby code.
00:27:59.840
So in our CatSay example, the backtrace might look a little bit like this. Our goal when taking this backtrace was to figure out which of these lines can we attribute the dynamic mix-in to, where can we say the dynamic mix-in happened?
00:28:19.600
Ruby did not have a way to do this for us, and so we ended up figuring out a new way on our own.
00:28:37.880
You would think that the best line to attribute a mix-in to is the line where the mix-in actually happened, where the include was called. That would make a lot of sense.
00:29:01.520
However, this is Ruby, and so that does not make a lot of sense. It turns out in Ruby you can have one gem that calls a method from another gem, and that second gem calls include. Should you attribute the include to the first gem or the second gem?
00:29:16.280
I'm not going to go into all the minutiae of this problem, and if you want to talk about it, feel free to come get me after. But basically, my coworker, Ufu Kai Sileo, came up with a new technique.
00:29:39.880
He said that we should find the part of the backtrace that has this top required label. This marks the line of code where a file is loaded such that it eventually leads to the dynamic include being called.
00:29:55.640
This is the correct place to attribute the dynamic mix-in to. What's cool about this is not the exact details of the situation, but that one of my teammates invented a new technique for Ruby reflection, and that allowed us to actually implement this functionality in Tapioca.
00:30:13.440
Now that we have this mix-in tracker object, we can see how it's used in the pipeline. We create a new listener, and it's called the Foreign Constants Listener.
00:30:30.160
This listener uses the mix-in tracker. It takes a constant from the pipeline, and remember, the mix-in tracker is sort of a database. So it searches and finds any objects or constants that are involved in a dynamic mixing with the first constant.
00:30:46.480
It'll even tell us whether or not that dynamic mix-in happened within the current gem, and it will put that constant back on the pipeline or back on the queue.
00:31:01.200
Because it has a dynamic mix-in that happened within the gem, the check knows that it can skip it or, rather, Tapioca knows it can skip the check and thus be processed by the pipeline.
00:31:23.440
This is going to generate the RBI we want—the RBI with the Object class and the include of CatSay. This is going to allow us to use the CatSay method without first having to include the CatSay module!
00:31:35.770
Sorbet is not going to complain about it because Tapioca generated the correct RBI.
00:31:41.470
So originally, I called this section extreme Ruby reflection, but really all we just did was combine reflection techniques to create new techniques that served our purposes.
00:31:44.539
An even cooler thing is that we can actually contribute these new techniques back to Ruby in the form of either upstreaming code or just contributing the ideas back to the Ruby maintainer team.
00:32:01.680
We've done this a few times in Tapioca. I'm almost done with my talk, but before I wrap up, I'd like to give a couple of shout-outs.
00:32:19.680
First, I'd like to talk about something that's very near and dear to my heart, which is wnb.rb. wnb.rb is the largest community of women and non-binary Ruby developers in the world, with almost a thousand members globally.
00:32:35.680
You can find out more about us on our website, wb-rb.dev. We're also hosting a casual lunch today here at the conference, so if you identify as a woman or non-binary person, please join us.
00:32:49.680
We'll be leaving from the venue here around 12:05 and walking over to SNW Market, which is about a 10-minute walk away.
00:33:02.720
I swear I didn't plan this with Jeremy's gratitude thing, but I'd also like to give a huge shout-out to Jeremy and Mark and all the other organizers of this conference. Organizing a conference is an immense undertaking, and it's a true love letter to the Ruby Community.
00:33:21.480
They've done a spectacular job, and we're so lucky to be here benefiting from all their hard work. Can we give them a round of applause?
00:33:39.000
So briefly recap what we talked about: first, we learned about the architecture of the Tapioca gem's RBI generation pipeline.
00:33:54.880
Then we learned some of the fundamentals of Ruby reflection and how they're applied in Tapioca, and finally, we learned that we can combine some of these techniques to invent new reflection techniques of our own.
00:34:11.320
I'm a nerd, so I'd like to consider this a kind of extreme sport, like skiing or snowboarding or giving conference talks while seven months pregnant.
00:34:28.680
But in the end, if you take anything away from this talk, I want it to be this: with Ruby, you can build really, really cool tools that help other developers find joy in coding and create things that matter.
00:34:40.160
Like organizing conferences! Building tooling is a kind of love letter to Ruby; it ensures the longevity of our community as well as the continuous growth and improvement of the Ruby language.
00:34:57.320
I hope that this talk has inspired you to want to do some of this kind of work and has given you some of the tools you'll need to get started. Thank you!
00:35:22.600
You!