Talks
Implementing Object Shapes for CRuby
Summarized using AI

Implementing Object Shapes for CRuby

by Jemma Issroff

The video titled "Implementing Object Shapes for CRuby" features a talk by Jemma Issroff at RubyConf AU 2023, focusing on the new feature introduced in Ruby 3.2 called Object Shapes. This innovative technique optimizes how Ruby handles instance variables, leading to improved performance in several areas.

Key Points Discussed:

- Introduction to Object Shapes:

- Object shapes are part of Ruby 3.2, designed to enhance the performance of instance variable lookups, reduce runtime checks, and improve Just-In-Time (JIT) compilation.

- Background on Instance Variables in Ruby 3.1:

- Prior to Ruby 3.2, instance variables functioned using a key-value pairing method, leading to potential memory inefficiencies when numerous objects were created.

- For instance, as multiple instances of a class were created, they duplicated keys, resulting in higher memory usage.

- Transition to Object Shapes:

- Object shapes introduce a new structuring method for Ruby objects, where properties such as instance variables are represented in a cohesive tree structure.

- Each object’s shape contains unique identifiers, allowing for efficient caching and less redundancy in memory storage.

- Implementation and Benefits of Object Shapes:

- Object shapes significantly increase cache hits during instance variable reads and writes, as multiple objects with similar properties can share shapes rather than creating duplicates.

- The structure also simplifies the management of frozen statuses and enhances overall performance by reducing code complexity.

- Notably, JIT compilers can leverage these shapes for further optimization during execution, resulting in faster access to instance variables.

- Micro-Benchmarking and Results:

- The implementation of object shapes has shown measurable improvements in performance, specifically in scenarios involving subclass methods compared to previous Ruby versions.

Important Conclusions:

- Jemma emphasizes that developers do not need to refactor existing code for compatibility with object shapes; rather, the focus should be on maintaining clear and understandable code.

- Additionally, a new API feature, objectspace.dump_shapes, allows developers to visualize object shapes, enhancing their ability to analyze and optimize their programs.

- The main takeaway is that object shapes represent a significant advancement in Ruby's performance capabilities, particularly for object-oriented programming.

Overall, Jemma Issroff's talk offers valuable insights into Ruby internals and the functional improvements in its latest version, making it essential viewing for those interested in Ruby's development and performance optimization.

00:00:00.000 Free! What do you think about shapes? I love shapes—barbecue ones or pizza ones.
00:00:05.279 The topping is so good on shapes, not the crackers, you know.
00:00:11.880 I'm talking about object shapes. Object what? Yeah, I have no idea what object shapes are.
00:00:18.359 They're a Ruby 3.2 thing. Mmm, Ruby 3.2 sounds tasty! I love me an object shape.
00:00:24.300 With the help of object shapes, we can increase cache hits in instance variable lookups, decrease runtime checks, and improve JIT performance.
00:00:29.760 That's where object shapes come in. In the next talk, we'll learn how they work, why we implemented them for Ruby 3.2, and some interesting implementation details.
00:00:34.980 Gemma Issroff works on Shopify's Ruby infrastructure team, and in 2022, she implemented object shapes in CRuby alongside Aaron Patterson.
00:00:42.899 She's also a co-founder of wnb.rb, a women and non-binary Ruby community, and the co-host of the Ruby on Rails podcast.
00:00:49.680 [Applause, laughter] But, Michael, you mentioned the Ruby on Rails podcast. Do they ever talk about pairing?
00:01:03.500 I'm sure she'll have us on to talk about pairing, but she told me she did a lot of pairing with Aaron Patterson while working on object shapes.
00:01:12.900 Her favorite part of pairing is seeing everyone else's different workflows and the little tools and tricks that they use.
00:01:19.860 Let's take a moment to thank our speaker sponsors. Gemma was brought here today by Shopify. Huge thanks to Shopify for helping with flights, accommodation, and for bringing Gemma all the way here to share her knowledge with us from New York City.
00:01:30.720 I'm going to steal your line: Gemma Issroff, implementing object shapes for CRuby!
00:01:51.360 Thank you!
00:02:06.840 Oh, and now we're on! I hit the off button accidentally. Thank you for that intro!
00:02:12.420 We would love to have you on the podcast, so let us know when.
00:02:23.400 So, Christmas each year is usually not a big deal for me; I don't celebrate Christmas. So, besides offices and shops being closed, it passes just like any other ordinary day.
00:02:41.459 However, as many of us in this room might know, we get a new Ruby version usually on Christmas Day. In 2022, Ruby 3.2 was released on Christmas Day, which made Christmas actually a really special day for me.
00:02:55.140 It was the first time I had a feature in Ruby that was shipping and is now live. So I did have a little something under the tree this past year, which was quite exciting.
00:03:06.420 Today, I'm going to talk to you about what I worked on in 2022. As Michael and Selena just said, it's called object shapes. We’re going to discuss implementing object shapes in CRuby.
00:03:20.040 I'm Gemma, and you can refer to my pronouns. If we haven't met yet, please come introduce yourself. I would really love to meet you.
00:03:26.159 As they also mentioned, I work at Shopify on the Ruby infrastructure team, so this was just part of my day job. There's been a lot of talk about open source at this conference and how to engage with it.
00:03:40.440 This is another way to do it: you can contribute at your workplace.
00:03:47.400 I am also a co-organizer of wnb.rb, a virtual community for women and non-binary individuals in Ruby. We're up to about 800 members now. If you're a woman or a non-binary person and you haven't heard of us, please come find me. I would love to tell you about it.
00:04:03.180 I called this talk 'Implementing Object Shapes in CRuby' because that is the nitty-gritty of what we are going to discuss.
00:04:10.500 But like Michael and Selena were saying, you might not know what that is, what it means, or why it's important to you. I could have also titled this talk 'How Instance Variables Work in Ruby.'
00:04:22.139 I think if we have all written Ruby, we have all used instance variables, and it’s clearly relevant to what we do.
00:04:27.479 The hope for this talk is that it will be accessible, whether you have no background in Ruby internals or a lot of background.
00:04:33.600 I will tease some points throughout the talk. Please come find me afterward, and I would love to go into further detail on any of those.
00:04:39.600 We will answer three main questions throughout the course of this talk.
00:04:45.960 First, we'll discuss how instance variables worked in Ruby 3.1 and what was going on before object shapes.
00:04:54.180 Then, we’ll talk about what object shapes are and what this technique means. Lastly, we’ll cover why we implemented them.
00:05:05.400 So, our first question is: how did instance variables work?
00:05:11.880 We’re going to start with a simple example here—nothing fancy going on: a class with an adder accessor.
00:05:18.720 In the initialize method, we set the title and the author of a post.
00:05:24.539 I know I picked very creative title and author names for this example.
00:05:31.020 When we have a new instance of a post, have we ever stopped to think about what’s happening behind the scenes?
00:05:39.360 How is Ruby actually working with our instance variables? How is it going to store them?
00:05:44.639 We all know as programmers that we can look at this and see that these are key-value pairs. The key is the name, and the value is A and B.
00:05:51.240 We can put that in a hash or hash map.
00:05:56.580 So let's create an instance variable map where the key will be the name of the instance variable and the value will be the value of the instance variable.
00:06:03.840 As we set our instance variables, we can put them in the map.
00:06:10.320 For instance, the instance variable title will have the value A, and the instance variable author will have the value B.
00:06:16.680 We can keep doing this for any further instance variables, which is good.
00:06:23.100 However, what happens if we have a second post?
00:06:28.500 Now, instead of having just one instance variable map, we need to create a separate instance variable map for the second post as well. This means we will have duplicated keys.
00:06:42.000 If we just have two instances of the post class, that’s okay, but if we have hundreds, thousands, or millions of instances, that’s a lot of space taken up with duplicated keys.
00:06:46.020 One solution can be to use an array for each instance.
00:06:54.240 We can store the values of the instance variables for one instance in an array.
00:07:00.300 The class itself will have a map linking instance variable names to indices in the array.
00:07:05.400 By doing this, each instance will only need to store values in an array instead of a map.
00:07:11.880 With this change, when we set a title, we check if the title is known in our post class.
00:07:17.639 If it’s not, we add it to the next available index—zero in this case—and put the value in that index.
00:07:24.840 We do the same for the author, using the next available index, which is one. This is how instance variables worked in Ruby 3.1.
00:07:31.259 We only have the keys appear once instead of duplicating them.
00:07:39.360 However, the hash lookup itself isn’t actually cheap. Reading instance variables is one of the most common operations we do in Ruby.
00:07:46.020 So even something like a hash lookup, we want to try to optimize.
00:07:53.100 To further optimize, we cache instance variable lookups using the class as the cache key.
00:08:00.300 So when we set a title to be A on this instruction, we’ll cache the index of title for the Post class as zero.
00:08:06.760 The same goes for the index of author as one.
00:08:12.840 That works well, but this is how Ruby 3.1 operates.
00:08:20.520 However, the issue arises when we have class inheritance.
00:08:27.360 For something like ActiveRecord, which we frequently use, we see this kind of case all the time.
00:08:34.920 When we have a new front page post that inherits from the Post class, we encounter cache misses.
00:08:41.820 This makes it somewhat slower for ActiveRecord than it needs to be.
00:08:53.880 Now, let’s talk about what object shapes are.
00:08:58.980 Just to clarify, we’re not talking about barbecue references here.
00:09:06.930 When we talk about object shapes, we really mean Ruby objects.
00:09:13.740 Object shapes represent the properties of a Ruby object.
00:09:19.860 When we say shapes, it’s in quotes. We mean that every object has a shape.
00:09:27.720 This technique is used in other VMs as well. The shape represents the properties of an object.
00:09:34.380 Some properties encoded with the shape include instance variables, frozen status, capacity, size, and pool.
00:09:40.380 If you don’t know what all that means, that’s okay—we'll get into it.
00:09:46.920 Looking back to our Post class, let’s see what a shape might look like for the first post.
00:09:54.540 Again, not a literal circle—it's just a symbol to represent the shape.
00:10:01.740 For the Post class, the instance variables we have are title and author.
00:10:07.140 The shape will include the names of those instance variables.
00:10:14.400 Additionally, the shape will have its own ID to uniquely identify it.
00:10:20.400 Now, if we look at another instance of the User class with names and logins, this User instance will also have a shape.
00:10:26.220 This shows us that both of these shapes have the same properties.
00:10:30.960 This means they are actually the same shape, which correlates to what we discussed with Ruby 3.1.
00:10:37.620 The shape can transition as we add new properties, like instance variables.
00:10:44.460 To look at this more concretely, every object starts at what we call the root shape.
00:10:51.000 This root shape is basically an empty shape with ID 0, meaning there’s nothing in it.
00:10:56.160 When we call first_post = Post.new, we start at that root shape.
00:11:03.600 As we set the title, we transition to a new shape whose instance variables contain just the title.
00:11:10.560 The new shape will have an ID based on its properties.
00:11:17.040 When we set the author, we will transition to another shape combining both title and author.
00:11:23.520 This newly created shape will also have its own unique ID.
00:11:30.120 If we had a different Post class using only the title in the initialize method and then set author later, the first post would carry a different shape.
00:11:36.000 The second post would end up with a different shape as well—a critical case to keep in mind.
00:11:42.240 Now, let’s look at one more example of transitioning shapes.
00:11:46.920 If we create an Image class, which has both a title and an image URL, we start again at the root shape.
00:11:53.280 Then we make a transition through title, followed by another transition for image URL.
00:12:00.360 As you may have noticed, these shapes form a tree structure, illustrating our object shapes.
00:12:07.620 We've now addressed two of our questions, and we have one left: why implement object shapes in CRuby?
00:12:13.800 We've seen how instance variables previously functioned and what object shapes are.
00:12:20.940 Now, let’s explore why this change was necessary.
00:12:26.760 We'll focus on three main points.
00:12:33.240 First, we talk about cache hits; then we'll discuss code complexity; lastly, how this interacts with JIT.
00:12:39.600 With object shapes, we were able to change the caching mechanism for instance variable reads and writes.
00:12:46.560 In Ruby 3.1, we encountered cache misses frequently with class inheritance.
00:12:54.420 In Ruby 3.2, with the implementation of object shapes, we avoid those cache misses.
00:13:01.800 Instead of our previous instance variable names map, we now have our shaped tree.
00:13:07.459 What we actually store on the shapes is the instance variable index, which helps us streamline the access.
00:13:13.500 So when we set title—first inspecting the instance variable index and then caching it where necessary.
00:13:21.240 This allows us to gain those increased cache hits.
00:13:27.540 I have micro-benchmarks to prove it.
00:13:34.620 Within CRuby, these benchmarks show that we’ve optimized the instance variable settings significantly.
00:13:42.240 A greater than one indicates that the shapes are faster, while less than one means Ruby 3.1 is faster.
00:13:50.460 What we’re focusing here is the benchmark for instance variable settings between shape and non-shape methods.
00:13:58.740 In the specific benchmark for subclass methods, we see optimized performance with object shape implementations.
00:14:04.920 Optimizing these micro-benchmarks is important, as we start seeing Ruby 3.2 in the wild.
00:14:13.140 So, Ruby 3.2 significantly enhances instance variable gets and sets and other changes that come with it.
00:14:20.160 Aside from cache hits, another main reason to implement object shapes in CRuby is to decrease code complexity.
00:14:28.740 We achieved this by reducing the number of frozen checks in instance variable sets.
00:14:36.600 This is because the frozen status is now included within the shape structure.
00:14:42.720 Now, we only have to check the frozen status on the slow path.
00:14:49.320 If we are caching the shape, which indicates we had already transitioned from that shape, we will be able to skip that check.
00:14:56.640 The other way we decrease complexity is by removing undef sets in object allocations.
00:15:03.600 When we call object.new, Ruby will allocate space for instance variables directly.
00:15:09.180 Previously in Ruby 3.1, Ruby would fill them in with undef values at that moment.
00:15:16.740 But with object shapes, we just traverse the trees to check the values of instance variables as needed.
00:15:25.860 Thus, we no longer need to perform worthless undef sets, improving object allocation performance.
00:15:32.820 The last reason to implement object shapes is that they benefit just-in-time compilers.
00:15:40.440 By having clear data about instance variables—that is, capacity and size—JIT can optimize performance.
00:15:49.500 Understanding the difference between embedded and extended objects plays a significant role in this.
00:15:56.220 The capacity on an object shape indicates how many instance variables can be set, while the size pool represents how objects are grouped.
00:16:04.440 This allows us to optimize memory access according to the capacity.
00:16:11.400 The code generated by the JIT takes advantage of these properties, making access to instance variables faster.
00:16:17.880 We can see significant performance improvements when looking at a benchmark for instance variable access.
00:16:23.820 Before object shapes, YJIT was already very optimized.
00:16:30.020 After object shapes were implemented, we achieved even greater performance metrics.
00:16:39.780 To summarize, we answered three main questions.
00:16:48.300 The workings of instance variables previously, how they now operate in Ruby 3.2, and what we’ve learned from implementing object shapes.
00:16:56.640 I usually expect two questions after these talks, so let me address them preemptively.
00:17:03.180 The first question often revolves around whether we should change specific code.
00:17:11.460 For instance, someone might show me a snippet that tricks down to a messy code structure.
00:17:17.640 They might ask if they should refactor their code to better fit the shape tree.
00:17:23.740 The answer is no, don’t change it just for object shapes.
00:17:30.720 Make your code clearer and easier for your teammates to understand.
00:17:39.180 Some of you may have already been using Ruby 3.2 without even realizing it.
00:17:45.500 If your code is confusing, you should definitely change it to make it better for other users.
00:17:53.380 The second question I often receive is how can I see object shapes in my program?
00:18:02.160 To answer that, we’ve made a small API addition: objectspace.dump_shapes, which provides a JSON representation of all of your object shapes.
00:18:12.360 With it, you can visualize them and construct the tree as needed.
00:18:21.200 I appreciate your time and thank you for the opportunity to speak here!
00:18:27.480 [Applause, music]
Explore all talks recorded at RubyConf AU 2023
+10