00:00:00.000
Free! What do you think about shapes? I love shapes—barbecue ones or pizza ones.
00:00:05.279
The topping is so good on shapes, not the crackers, you know.
00:00:11.880
I'm talking about object shapes. Object what? Yeah, I have no idea what object shapes are.
00:00:18.359
They're a Ruby 3.2 thing. Mmm, Ruby 3.2 sounds tasty! I love me an object shape.
00:00:24.300
With the help of object shapes, we can increase cache hits in instance variable lookups, decrease runtime checks, and improve JIT performance.
00:00:29.760
That's where object shapes come in. In the next talk, we'll learn how they work, why we implemented them for Ruby 3.2, and some interesting implementation details.
00:00:34.980
Gemma Issroff works on Shopify's Ruby infrastructure team, and in 2022, she implemented object shapes in CRuby alongside Aaron Patterson.
00:00:42.899
She's also a co-founder of wnb.rb, a women and non-binary Ruby community, and the co-host of the Ruby on Rails podcast.
00:00:49.680
[Applause, laughter] But, Michael, you mentioned the Ruby on Rails podcast. Do they ever talk about pairing?
00:01:03.500
I'm sure she'll have us on to talk about pairing, but she told me she did a lot of pairing with Aaron Patterson while working on object shapes.
00:01:12.900
Her favorite part of pairing is seeing everyone else's different workflows and the little tools and tricks that they use.
00:01:19.860
Let's take a moment to thank our speaker sponsors. Gemma was brought here today by Shopify. Huge thanks to Shopify for helping with flights, accommodation, and for bringing Gemma all the way here to share her knowledge with us from New York City.
00:01:30.720
I'm going to steal your line: Gemma Issroff, implementing object shapes for CRuby!
00:01:51.360
Thank you!
00:02:06.840
Oh, and now we're on! I hit the off button accidentally. Thank you for that intro!
00:02:12.420
We would love to have you on the podcast, so let us know when.
00:02:23.400
So, Christmas each year is usually not a big deal for me; I don't celebrate Christmas. So, besides offices and shops being closed, it passes just like any other ordinary day.
00:02:41.459
However, as many of us in this room might know, we get a new Ruby version usually on Christmas Day. In 2022, Ruby 3.2 was released on Christmas Day, which made Christmas actually a really special day for me.
00:02:55.140
It was the first time I had a feature in Ruby that was shipping and is now live. So I did have a little something under the tree this past year, which was quite exciting.
00:03:06.420
Today, I'm going to talk to you about what I worked on in 2022. As Michael and Selena just said, it's called object shapes. We’re going to discuss implementing object shapes in CRuby.
00:03:20.040
I'm Gemma, and you can refer to my pronouns. If we haven't met yet, please come introduce yourself. I would really love to meet you.
00:03:26.159
As they also mentioned, I work at Shopify on the Ruby infrastructure team, so this was just part of my day job. There's been a lot of talk about open source at this conference and how to engage with it.
00:03:40.440
This is another way to do it: you can contribute at your workplace.
00:03:47.400
I am also a co-organizer of wnb.rb, a virtual community for women and non-binary individuals in Ruby. We're up to about 800 members now. If you're a woman or a non-binary person and you haven't heard of us, please come find me. I would love to tell you about it.
00:04:03.180
I called this talk 'Implementing Object Shapes in CRuby' because that is the nitty-gritty of what we are going to discuss.
00:04:10.500
But like Michael and Selena were saying, you might not know what that is, what it means, or why it's important to you. I could have also titled this talk 'How Instance Variables Work in Ruby.'
00:04:22.139
I think if we have all written Ruby, we have all used instance variables, and it’s clearly relevant to what we do.
00:04:27.479
The hope for this talk is that it will be accessible, whether you have no background in Ruby internals or a lot of background.
00:04:33.600
I will tease some points throughout the talk. Please come find me afterward, and I would love to go into further detail on any of those.
00:04:39.600
We will answer three main questions throughout the course of this talk.
00:04:45.960
First, we'll discuss how instance variables worked in Ruby 3.1 and what was going on before object shapes.
00:04:54.180
Then, we’ll talk about what object shapes are and what this technique means. Lastly, we’ll cover why we implemented them.
00:05:05.400
So, our first question is: how did instance variables work?
00:05:11.880
We’re going to start with a simple example here—nothing fancy going on: a class with an adder accessor.
00:05:18.720
In the initialize method, we set the title and the author of a post.
00:05:24.539
I know I picked very creative title and author names for this example.
00:05:31.020
When we have a new instance of a post, have we ever stopped to think about what’s happening behind the scenes?
00:05:39.360
How is Ruby actually working with our instance variables? How is it going to store them?
00:05:44.639
We all know as programmers that we can look at this and see that these are key-value pairs. The key is the name, and the value is A and B.
00:05:51.240
We can put that in a hash or hash map.
00:05:56.580
So let's create an instance variable map where the key will be the name of the instance variable and the value will be the value of the instance variable.
00:06:03.840
As we set our instance variables, we can put them in the map.
00:06:10.320
For instance, the instance variable title will have the value A, and the instance variable author will have the value B.
00:06:16.680
We can keep doing this for any further instance variables, which is good.
00:06:23.100
However, what happens if we have a second post?
00:06:28.500
Now, instead of having just one instance variable map, we need to create a separate instance variable map for the second post as well. This means we will have duplicated keys.
00:06:42.000
If we just have two instances of the post class, that’s okay, but if we have hundreds, thousands, or millions of instances, that’s a lot of space taken up with duplicated keys.
00:06:46.020
One solution can be to use an array for each instance.
00:06:54.240
We can store the values of the instance variables for one instance in an array.
00:07:00.300
The class itself will have a map linking instance variable names to indices in the array.
00:07:05.400
By doing this, each instance will only need to store values in an array instead of a map.
00:07:11.880
With this change, when we set a title, we check if the title is known in our post class.
00:07:17.639
If it’s not, we add it to the next available index—zero in this case—and put the value in that index.
00:07:24.840
We do the same for the author, using the next available index, which is one. This is how instance variables worked in Ruby 3.1.
00:07:31.259
We only have the keys appear once instead of duplicating them.
00:07:39.360
However, the hash lookup itself isn’t actually cheap. Reading instance variables is one of the most common operations we do in Ruby.
00:07:46.020
So even something like a hash lookup, we want to try to optimize.
00:07:53.100
To further optimize, we cache instance variable lookups using the class as the cache key.
00:08:00.300
So when we set a title to be A on this instruction, we’ll cache the index of title for the Post class as zero.
00:08:06.760
The same goes for the index of author as one.
00:08:12.840
That works well, but this is how Ruby 3.1 operates.
00:08:20.520
However, the issue arises when we have class inheritance.
00:08:27.360
For something like ActiveRecord, which we frequently use, we see this kind of case all the time.
00:08:34.920
When we have a new front page post that inherits from the Post class, we encounter cache misses.
00:08:41.820
This makes it somewhat slower for ActiveRecord than it needs to be.
00:08:53.880
Now, let’s talk about what object shapes are.
00:08:58.980
Just to clarify, we’re not talking about barbecue references here.
00:09:06.930
When we talk about object shapes, we really mean Ruby objects.
00:09:13.740
Object shapes represent the properties of a Ruby object.
00:09:19.860
When we say shapes, it’s in quotes. We mean that every object has a shape.
00:09:27.720
This technique is used in other VMs as well. The shape represents the properties of an object.
00:09:34.380
Some properties encoded with the shape include instance variables, frozen status, capacity, size, and pool.
00:09:40.380
If you don’t know what all that means, that’s okay—we'll get into it.
00:09:46.920
Looking back to our Post class, let’s see what a shape might look like for the first post.
00:09:54.540
Again, not a literal circle—it's just a symbol to represent the shape.
00:10:01.740
For the Post class, the instance variables we have are title and author.
00:10:07.140
The shape will include the names of those instance variables.
00:10:14.400
Additionally, the shape will have its own ID to uniquely identify it.
00:10:20.400
Now, if we look at another instance of the User class with names and logins, this User instance will also have a shape.
00:10:26.220
This shows us that both of these shapes have the same properties.
00:10:30.960
This means they are actually the same shape, which correlates to what we discussed with Ruby 3.1.
00:10:37.620
The shape can transition as we add new properties, like instance variables.
00:10:44.460
To look at this more concretely, every object starts at what we call the root shape.
00:10:51.000
This root shape is basically an empty shape with ID 0, meaning there’s nothing in it.
00:10:56.160
When we call first_post = Post.new, we start at that root shape.
00:11:03.600
As we set the title, we transition to a new shape whose instance variables contain just the title.
00:11:10.560
The new shape will have an ID based on its properties.
00:11:17.040
When we set the author, we will transition to another shape combining both title and author.
00:11:23.520
This newly created shape will also have its own unique ID.
00:11:30.120
If we had a different Post class using only the title in the initialize method and then set author later, the first post would carry a different shape.
00:11:36.000
The second post would end up with a different shape as well—a critical case to keep in mind.
00:11:42.240
Now, let’s look at one more example of transitioning shapes.
00:11:46.920
If we create an Image class, which has both a title and an image URL, we start again at the root shape.
00:11:53.280
Then we make a transition through title, followed by another transition for image URL.
00:12:00.360
As you may have noticed, these shapes form a tree structure, illustrating our object shapes.
00:12:07.620
We've now addressed two of our questions, and we have one left: why implement object shapes in CRuby?
00:12:13.800
We've seen how instance variables previously functioned and what object shapes are.
00:12:20.940
Now, let’s explore why this change was necessary.
00:12:26.760
We'll focus on three main points.
00:12:33.240
First, we talk about cache hits; then we'll discuss code complexity; lastly, how this interacts with JIT.
00:12:39.600
With object shapes, we were able to change the caching mechanism for instance variable reads and writes.
00:12:46.560
In Ruby 3.1, we encountered cache misses frequently with class inheritance.
00:12:54.420
In Ruby 3.2, with the implementation of object shapes, we avoid those cache misses.
00:13:01.800
Instead of our previous instance variable names map, we now have our shaped tree.
00:13:07.459
What we actually store on the shapes is the instance variable index, which helps us streamline the access.
00:13:13.500
So when we set title—first inspecting the instance variable index and then caching it where necessary.
00:13:21.240
This allows us to gain those increased cache hits.
00:13:27.540
I have micro-benchmarks to prove it.
00:13:34.620
Within CRuby, these benchmarks show that we’ve optimized the instance variable settings significantly.
00:13:42.240
A greater than one indicates that the shapes are faster, while less than one means Ruby 3.1 is faster.
00:13:50.460
What we’re focusing here is the benchmark for instance variable settings between shape and non-shape methods.
00:13:58.740
In the specific benchmark for subclass methods, we see optimized performance with object shape implementations.
00:14:04.920
Optimizing these micro-benchmarks is important, as we start seeing Ruby 3.2 in the wild.
00:14:13.140
So, Ruby 3.2 significantly enhances instance variable gets and sets and other changes that come with it.
00:14:20.160
Aside from cache hits, another main reason to implement object shapes in CRuby is to decrease code complexity.
00:14:28.740
We achieved this by reducing the number of frozen checks in instance variable sets.
00:14:36.600
This is because the frozen status is now included within the shape structure.
00:14:42.720
Now, we only have to check the frozen status on the slow path.
00:14:49.320
If we are caching the shape, which indicates we had already transitioned from that shape, we will be able to skip that check.
00:14:56.640
The other way we decrease complexity is by removing undef sets in object allocations.
00:15:03.600
When we call object.new, Ruby will allocate space for instance variables directly.
00:15:09.180
Previously in Ruby 3.1, Ruby would fill them in with undef values at that moment.
00:15:16.740
But with object shapes, we just traverse the trees to check the values of instance variables as needed.
00:15:25.860
Thus, we no longer need to perform worthless undef sets, improving object allocation performance.
00:15:32.820
The last reason to implement object shapes is that they benefit just-in-time compilers.
00:15:40.440
By having clear data about instance variables—that is, capacity and size—JIT can optimize performance.
00:15:49.500
Understanding the difference between embedded and extended objects plays a significant role in this.
00:15:56.220
The capacity on an object shape indicates how many instance variables can be set, while the size pool represents how objects are grouped.
00:16:04.440
This allows us to optimize memory access according to the capacity.
00:16:11.400
The code generated by the JIT takes advantage of these properties, making access to instance variables faster.
00:16:17.880
We can see significant performance improvements when looking at a benchmark for instance variable access.
00:16:23.820
Before object shapes, YJIT was already very optimized.
00:16:30.020
After object shapes were implemented, we achieved even greater performance metrics.
00:16:39.780
To summarize, we answered three main questions.
00:16:48.300
The workings of instance variables previously, how they now operate in Ruby 3.2, and what we’ve learned from implementing object shapes.
00:16:56.640
I usually expect two questions after these talks, so let me address them preemptively.
00:17:03.180
The first question often revolves around whether we should change specific code.
00:17:11.460
For instance, someone might show me a snippet that tricks down to a messy code structure.
00:17:17.640
They might ask if they should refactor their code to better fit the shape tree.
00:17:23.740
The answer is no, don’t change it just for object shapes.
00:17:30.720
Make your code clearer and easier for your teammates to understand.
00:17:39.180
Some of you may have already been using Ruby 3.2 without even realizing it.
00:17:45.500
If your code is confusing, you should definitely change it to make it better for other users.
00:17:53.380
The second question I often receive is how can I see object shapes in my program?
00:18:02.160
To answer that, we’ve made a small API addition: objectspace.dump_shapes, which provides a JSON representation of all of your object shapes.
00:18:12.360
With it, you can visualize them and construct the tree as needed.
00:18:21.200
I appreciate your time and thank you for the opportunity to speak here!
00:18:27.480
[Applause, music]