Don't @ me! Faster Instance Variables with Object Shapes

by Aaron Patterson

In his talk 'Don't @ me! Faster Instance Variables with Object Shapes' at RubyConf 2022, speaker Aaron Patterson explores the implementation and optimization of instance variables in Ruby through the use of 'Object Shapes'. He begins with a light-hearted personal anecdote about navigating elevator buttons before diving deep into the technical aspects of instance variables, known as ivars or IVs.

Key points discussed include:
- Overview of Instance Variables: Aaron explains how instance variables function within Ruby, starting from their implementation in earlier versions like Ruby 1.8 that used hash tables for storage. Each instance variable is stored as a key-value pair in a hash associated with the object.
- Evolution to Virtual Machines: He contrasts this with Ruby 1.9’s introduction of a virtual machine that better executes code by compiling it into bytecode, leading to improved performance. The difference in execution models is illustrated by how the virtual machine handles method calls through a stack mechanism.
- Exploration of Performance: Aaron highlights the limitations of hash tables, particularly their relative slowness and higher memory consumption compared to arrays. This sets the stage for discussing how refactoring ivars to use arrays instead can enhance performance.
- Introducing Object Shapes: The central concept of the talk is 'Object Shapes', a technique aimed at speeding up access to instance variables. Aaron elaborates on how this project integrates with the upcoming Ruby 3.2 release and works in conjunction with Ruby's wide Just-In-Time (JIT) compilation to optimize ivar access.
- Collaborative Efforts: Throughout the presentation, Aaron acknowledges the contributions of his colleagues at Shopify and the Ruby core team, emphasizing the collective effort in improving Ruby’s infrastructure.
- Takeaway Message: The overarching theme of the talk is to empower developers with deeper knowledge about instance variables and the strategies available to improve application performance, ultimately enhancing their proficiency in Ruby development.

By the end of the presentation, participants gain insight into the intricacies of Ruby instance variable management and are encouraged to leverage these optimizations for better performance in their own projects.

00:00:00 Ready for takeoff.

00:00:16 Oh my goodness! You can't, I get at least like six seconds back on this clock. That's not fair! They started the clock before they switched the slides up here, and I have to use all of these minutes. Thank you.

00:00:29 Thank you.

00:00:34 Oh, alright. Um, did anybody else have a hard time with these elevator buttons? When I got to the hotel, I just went to the elevator and thought, 'I don't understand where the buttons are.' One of the doors was open, so I just went in and used that.

00:00:46 Then I had to leave to meet people, and when I went to the elevator again, I thought, 'I don’t know how to use these.' It took forever for me to figure out that those buttons were actually part of the normal operation. I thought the emergency sign was the button.

00:01:05 I don’t know why I’m going on about this; I just don’t have time. So, this talk is titled: "Don't @ Me! Faster Instance Variables with Object Shapes."

00:01:11 Oh, hold on a sec, let me shut off the notifications here. Okay, sorry! Yeah, I'm very excited to be in Houston.

00:01:21 I've never been to Houston before, so I'm really happy to be here. The food is great; this is a lot of fun.

00:01:39 My name is Aaron Patterson. I usually put about 15 minutes of stand-up at the beginning of my presentations, but I just don’t have time for that here, so I cut all of it. I’m really sorry.

00:01:51 I’m part of the Ruby core team, and I’m also on the Rails core team. I go by 'tenderlove' everywhere online, so you can find me on all social media under that handle, except on LinkedIn where I use my more professional name, which is also tenderlove.

00:02:07 I work for a mom-and-pop e-commerce website called Shopify. I’m on the Ruby infrastructure team at Shopify, where we're working on various projects to improve the performance of Ruby as well as the quality of life of developers.

00:02:22 Our customers essentially are the development teams at Shopify, so we're making Ruby and Rails better so that they can do their jobs more quickly and with fewer resources. We are working on projects like wide GC improvements, the variable-width allocation project, and other infrastructure improvements.

00:02:41 Today, I want to talk to you about instance variables and how they work. I was going to call this talk 'Instance Variables TMI' because I am going to share way too much information about instance variables. But instead of just presenting pure facts, I want to derive the way that instance variables work.

00:02:59 Hopefully, we can implement them together, and you’ll be able to come away with a deeper understanding of how they work and why we make different decisions regarding performance and optimization.

00:03:15 I’m also going to be talking about object shapes, which is a technique that we use for speeding up access to instance variables, as well as other things. This project has been ongoing at work; my team has been working on it, and it’s going to be shipping along with the Ruby 3.2 release.

00:03:36 I’m also going to discuss how all of these elements work together with a wide JIT to make instance variable access extremely fast. I'm going to cover all of this in 30 minutes.

00:03:53 First off, I want to say thanks to everyone on the Ruby infrastructure team, especially the YG team. I’ve been working very closely with Gemma on this project, and I also want to thank Maxine for her guidance.

00:04:00 Additionally, a shout out to John Hawthorne at GitHub, who has also been helping with this project. There have been a lot of people working together on this.

00:04:20 So let’s discuss how instance variables work. This is a joke for all the people from Seattle.

00:04:30 Just a note here: I’ll refer to them as instance variables, Ivars, or IVs. Those all mean the same thing; I just need to shorten it sometimes because I only have 26 minutes left.

00:04:44 Let’s talk about implementing instance variables. Let’s say we have a very simple class like this with a few instance variables on it. If we were implementing a language, how might we store this data? I think a simple approach would be to store your instance variables in a hash table associated with the instance.

00:05:02 For example, we’ll have our instance of 'hello' here, and we’ll say we have a hash table associated with it. The key in the hash table will be the name of the instance variable, and the value will be the value of that instance variable.

00:05:26 We can imagine writing this code is quite easy. When you write something, it writes to the hash table; when you read something, it reads from it. All of this seems simple to implement if we understand how hash tables work.

00:05:40 In fact, this is how instance variables were implemented in Ruby 1.8 and earlier. They functioned via a tree-walking interpreter, where we would take your code and turn it into a tree and then walk each node in that tree to evaluate them.

00:06:03 Let's walk through an example. We have a very simple method called Foo. The way it works is we evaluate its children before we can evaluate foo plus bar.

00:06:30 Foo does a hash lookup to get its value, and then bar also does a hash lookup to get its value.

00:06:38 Once we have those values, they get returned up the tree, and plus can execute to add those two values together and return that to the caller. Now Ruby 1.9 came along and introduced a virtual machine.

00:07:02 The virtual machine compiles all of your code into bytecode and executes that bytecode. I won’t get into the compilation process, as we don’t have much time.

00:07:18 But let’s walk through how the virtual machine executes this method.

00:07:27 The compiler converts the Foo method into bytecode, and it’s going to walk through those instructions one at a time, executing them while manipulating a stack.

00:07:41 The first thing we do is get the IVAR here, pushing one onto the stack, and then we get the IVAR for bar, pushing two onto the stack. When we execute plus, it pops those two values off the stack and pushes the return value.

00:08:05 Imagining how we might implement the get Ivar instruction, it could be simple. We’ll say we take the name, which comes from the instruction, and first look up self.

00:08:18 Self will be stored in the current frame. We want to get the hash table of instance variables, and all we need to do is look up the value by name in that hash table, and push the value onto the stack.

00:08:31 It’s easy to see how we could transition from a tree-walking interpreter to a virtual machine implementation. However, the problem with this implementation is that hashes are relatively slow compared to arrays.

00:08:48 I don’t want to say hashes are slow, but they aren't as fast as arrays. Hashes also use a lot of memory when compared to arrays because the hash data structure uses more memory than an array would.

00:09:03 So could we use an array instead of a hash? Yes, we could do that. Imagine a simple class again with a couple of instance variables.

00:09:24 When we allocate a new instance of hello, that instance must point to a class. We can say to the class, 'Do you have an index for this instance variable?'

See Slides on speakerdeck.com