RubyConf 2022

Don't @ me! Faster Instance Variables with Object Shapes

Instance variables are a popular feature of the Ruby programming language, and many people enjoy using them. They are so popular that the Ruby core team has done lots of work to speed them up. But we can do even better to speed them up by using a technique called "Object Shapes". In this presentation we'll learn about what object shapes are, how they are implemented, how how they can be used to speed up getting and setting instance variables. We'll make sure to square up Ruby instance variable implementation details so that you can become a more well rounded developer!

RubyConf 2022

00:00:00.000 ready for takeoff
00:00:16.920 oh my goodness you can't I get at least like six seconds back on
00:00:23.760 this clock that's not fair they started the clock before they switched the slides up here and I gotta use all of
00:00:29.760 these minutes thank you thank you
00:00:34.820 oh all right um did anybody else like did anybody else have a hard time with these
00:00:40.500 elevator buttons like when I got to when I got to the hotel I just I went to the elevator I'm
00:00:46.860 like I don't understand where the buttons are but one of the one of the doors is open I'll just go in and use that so I did that and then and then I
00:00:53.760 had to leave and meet meet people and I go to the elevator and I'm like uh I'm not I'm not I don't know how to use
00:01:00.360 these it took forever for me to figure out that these buttons are actually like I thought this was the emergency sign
00:01:05.400 which it is but I thought the buttons were like part of it um anyway
00:01:11.159 I don't know why I'm going on about this I just I don't have time uh so this this
00:01:16.320 talk is titled don't at me uh faster instance variables with object shapes oh
00:01:21.360 hold on a sec let me I gotta shut off the notifications here
00:01:26.580 okay okay sorry yeah I'm very excited to be in Houston I've
00:01:33.240 never been to Houston before so I'm really I'm really happy to be here the food is great this is a lot of fun
00:01:39.240 um my name is Aaron Patterson I usually put about like 15 minutes worth of stand up at the beginning of my presentations
00:01:45.600 but I just don't have time for that here so I cut all of those I'm really sorry um
00:01:51.240 I'm part of the Ruby quartz name I'm also on the rails core team uh my I go
00:01:57.659 by tenderlove everywhere online so you can find me you can find me on all the social media with with that handle uh
00:02:04.500 except for on LinkedIn I use my more professional name there which is also Tender Love uh
00:02:11.340 I work for a mom and pop e-commerce website called Shopify
00:02:16.680 um I'm on the Ruby infrastructure team at Shopify our team is working on like different projects to improve the
00:02:22.500 performance of Ruby as well as the quality of life of developers uh working working for Shopify and also in the Ruby
00:02:29.040 community at large uh I would say like our team's customers are essentially the development teams at Shopify so we're
00:02:35.340 making Ruby and rails better so that they can get their jobs done more quickly and with fewer resources we're
00:02:41.879 working on different projects like uh wide GC improvements like the variable
00:02:47.220 width allocation project and other infrastructure improvements as well so I'm here today to talk to you about
00:02:52.980 instance variables and how how they work um I was going to talk like call this
00:02:59.220 talk instance variables TMI because I am going to tell you way too much information about instance variables but
00:03:05.879 instead of just being like a pure fact-based mission here what I want to
00:03:10.980 do is I'm I want to derive the way that instance variables work so that hopefully we're going to implement them
00:03:16.620 together and hopefully you'll be able to come away with a deeper understanding of how how they work and why we make
00:03:22.019 different decisions with regards to Performance and optimization so I'm also going to be talking about
00:03:27.959 object shapes which is a technique that we use for speeding up Axis of instance variables as well as other things this
00:03:34.620 project has been ongoing at work my team's been working on it and it's going to be shipping along with the Ruby 3.2
00:03:39.959 release and I'm also going to be talking about how all of these things work together with a wide jit in order to
00:03:47.519 make instance variable access extremely fast and I'm going to do all of this in
00:03:53.700 30 minutes first off I want to say thanks to everyone on the Ruby infrastructure team
00:04:00.720 the YG team especially I've been working very closely with Gemma on this project and I also want to thank Maxine for her
00:04:07.799 guidance on the project as well and I also want to give a shout out to John Hawthorne at GitHub he's been helping he's been helping on this project as
00:04:14.040 well so there's been a lot of people working on this together so let's talk about how Ivars work uh this is a joke
00:04:21.600 for all the people from Seattle yes um very local joke all right so
00:04:30.000 just a kind of a note here I'm going to refer to them as instance variables also Ivars and also IVs those all mean the
00:04:37.680 same thing I just need to shorten it sometimes because I only have 26 minutes left so let's talk about implementing
00:04:44.340 instance variables let's say we have a very simple class like this with a couple instance variables on it if we
00:04:50.220 were implementing a language like how might we how might we store this data like how would we store it I think a
00:04:56.460 really a really simple way to accomplish this task would be to just store your instance variables in a hash table on
00:05:02.460 the instance so for example we'd have our instance of hello here and we'd say all right we got a hash table associated
00:05:08.639 with the instance of hello the key to the hash table is going to be the name of the instance variable and the value
00:05:13.860 the value in the hash table will be the the value of the instance variable uh and we can imagine writing this writing
00:05:20.820 this code is pretty easy we could imagine implementing it something like this where we just have that hash table
00:05:26.699 associated with all of our instances when you write something it writes the hash table when you read something it
00:05:32.580 reads from the hash table Etc like all these seems pretty easy to implement if we understand how hash tables work in
00:05:39.600 fact this is this is assistance variables were implemented in Ruby one eight and earlier that's how
00:05:46.259 they how they worked uh Ruby 180 earlier was implemented via tree walking interpreter and the way the tree walking
00:05:53.039 interpreter works is we would take we would take your code and turn it into a tree and then we would walk each node in
00:05:59.820 the tree and evaluate those nodes in the tree so here we have a very simple example this tree is representing the
00:06:05.940 code inside of the method Foo so we have Foo plus bar and the way it would work is we would
00:06:11.880 say okay we're gonna We want to evaluate that method or that plus node there but we can't evaluate it yet because we have
00:06:18.060 to evaluate its children so we evaluate foo foo does a hash lookup to get that
00:06:23.400 value one out and then bar also does a hash lookup to get the value 2 out and then we return those values back up the
00:06:30.300 tree they get returned up plus is able to execute add those two together and then return to the caller now 1.9 came
00:06:38.699 along Ruby 1.9 came along and introduced a virtual machine and what the virtual
00:06:44.460 machine did is it would compile all of your code into a byte code and execute execute that byte code I'm not going to
00:06:51.360 go into the compilation process because not much time uh but let's walk through
00:06:56.460 how the virtual machine might execute this so the virtual machine first it's going to take or the compiler is going
00:07:02.520 to take that Foo method and convert it into byte code so we'll have some byte code like this and it's going to walk
00:07:08.280 through each of those instructions one at a time and just execute them and as it's executing those instructions it's
00:07:13.860 going to manipulate a stack so we also have a stack right here so you imagine all right we're just going to Loop
00:07:19.139 through each of these execute them and then manipulate the stack the first thing we'll do is we'll say hey get Ivar here we'll push one onto the stack uh
00:07:26.880 get if our for bar we'll push two onto the stack and then when we execute the plus method plus is going to pop those
00:07:33.479 two values off the stack and then push the return value onto the stack so if we were implementing this virtual
00:07:40.740 machine we can imagine how one might implement the git get Ivar instruction it would be pretty simple just maybe a
00:07:47.699 method like this where we say hey I'm going to take the name the name is going to come from the instruction it's a parameter from the instruction and the
00:07:54.539 first thing I'm going to do is I'm going to look up self like what is the value what is the object that we're operating on and in this case self is going to be
00:08:01.380 stored on the current in the current frame it's the current thing that we're operating on then we're going to say hey
00:08:06.780 I want to get the the hash table of instance variables and then all I need to do is I just need to look up that
00:08:14.099 that value by name from the hash table and then just push that value onto the stack so it's pretty easy to imagine how
00:08:20.099 we could go from the tree walking interpreter into the into the virtual machine implementation
00:08:25.740 now the problem with this implementation is that hashes are slow when compared to arrays I don't want to like hashes
00:08:33.120 aren't that slow but if you compare it to an array yes it's low um hashes use a lot of memory when
00:08:38.399 compared to arrays because we have to the hash data structure uses a lot more room than an array would use so could we
00:08:45.180 use an array instead of a hash the answer is yes we could do that but let's imagine like how how might we Implement
00:08:51.360 something like that so here we have our simple class again hello with a couple instance variables
00:08:57.480 on it and when we what we can do is we can say all right well when we allocate this new instance of hello that instance
00:09:03.779 has to point at a class so it's going to point at the hello class what we can do is we can say hey class um do you have
00:09:11.820 an index for this instance variable so we'll go ahead up here we'll say hey I want to set the instance variable Foo do
00:09:18.540 you have an index for it and at first the class is like no I don't have an index for it I will make you one so it
00:09:24.540 inserts Foo into this hash table with an index of zero so we know that Foo maps to the index zero then we go ahead and
00:09:31.440 we set the value 1 at the zeroth index in the array that's stored on the
00:09:36.660 instance so our class is keeping a map of names to indexes indices so we'll do
00:09:43.320 the same thing for bar we'll set bar we'll say hey do you have an index for bar it doesn't so it creates a new one and then we store the value 2 in the in
00:09:50.399 that one element index uh so when we do this a second time so
00:09:56.100 let's say we perform this on a second instance we're still running the program um our new instance will just say hey I
00:10:03.000 want to set Foo and bar those already exist in that hash table so we don't need to like we don't need to create new
00:10:09.360 indices it just uses those and sets those in the array now you might be looking at this and
00:10:15.240 thinking well you know you said oh hey we have a half table uses memory don't like what's the deal we still have a hash table here that is true yes but we
00:10:23.760 have we're able to use amortize the cost of this hash table across multiple instances of hello so we can say well
00:10:30.060 now we're able instead of having a hash table per instance now we have one that's associated with the class and all
00:10:36.420 of these instances are able to take advantage of the fact that that is stored on the class instead of the instance now
00:10:43.080 going from this I want to talk about object layout a little bit because this is going to be important when we talk
00:10:48.300 about the jit compiler the objects are laid out we'll say that objects are 40 bytes wide some objects are wider in the
00:10:55.980 new version of Ruby 3.2 but we're going to consider 40 byte objects and uh that
00:11:01.740 allows us to set three instance variables inside the object itself so we
00:11:06.959 can store three instance variables in line in the object so in this particular case we're going to store the values 1 and 2 inside that array we'll treat the
00:11:13.920 object itself as the array now the other thing I want to point out
00:11:19.079 before we get to talking about jit compilation is that the place where we store instance variables is different
00:11:25.440 depending on the type of object that we're dealing with so on the left here we just have a normal plain old Ruby
00:11:31.200 object and we set instance variables on it on the right we have a subclass of array where we store instance variables
00:11:36.959 the the storage location of those instance variables changes depending on the type so we care about the type
00:11:45.660 um so let's revisit the instruction implementation now we have to take into account when we're looking up the
00:11:50.820 instance variable hey we got to look at a class like give me the class for the self so first off we have to ask for the
00:11:56.880 class then we ask the class for the index and then we're able to look up the instance variable based on the index and
00:12:03.000 remember we have an if statement there to check whether we're looking it up on an object or versus some other thing but
00:12:09.660 unfortunately you might notice oh geez we're still doing a hash look up here and you said hashes are slow so we need
00:12:15.540 to eliminate that and this is where we get into inline caches where we should be able to Cache that hash look up and
00:12:22.019 we can do it inside of what is called an inline cache these are simply cache objects that are stored in line with
00:12:28.200 your byte code so we'll say here we're going to add a new parameter inside of the byte code which is this magic cache
00:12:34.560 object right there is the magic cache object and what we'll do is we'll cache
00:12:39.600 the index inside of that that object so we'll say hey if we don't have an end
00:12:44.700 decks cached already please go look up the index and then store it inside of the cache if we then below here we just
00:12:51.720 use the cached index so if there is one cached we'll use it otherwise we go look it up what this means is that there's
00:12:57.360 usually no hash lookups the first time we execute it yes we have to do it but subsequent times we don't have to do
00:13:02.639 that hash lookup so we're able to eliminate it however there's a problem with this uh remember that our name to
00:13:09.060 index mapping is per class so when we create the hello class we've mapped Foo and bar to zero and one and we have an
00:13:15.660 instance there as well but when we execute the foo method we're going to
00:13:21.240 store the indices 0 and 1 inside of the inline cache for that Foo method to look up look up the instance variable zero
00:13:27.420 and one now the problem is like what about this world class down here World
00:13:32.700 sets an instance variable it is a world-class class
00:13:40.519 sorry I'm going to chew up a few seconds for at least one pun um
00:13:45.660 all right so we have a world-class class world class here it's it inherits from
00:13:51.120 Hello it sets an instance variable before it calls super so you can imagine what the index table looks like on the
00:13:57.300 world class we've set hello we've set oops first in the world class but we've cached the
00:14:03.660 value zero and one in the foo method so what happens when we call the foo method well we're going to look up the wrong
00:14:08.940 values we're going to look up the values for oops and we're going to look up the values for Foo rather than Foon bar which is what we expected so this is
00:14:15.660 going to raise an exception and blow up so we can fix this problem it's not a big deal we just use a class as a cache
00:14:22.139 key as well we say all right uh the class needs to match we need to have an index we also need to have an index set
00:14:29.160 and then we can finally use that so our our instruction is getting a little bit more complicated now we have another
00:14:35.220 problem where I keep throwing in problems here let's say we have an empty hello empty world class down at the
00:14:41.820 bottom we have changed nothing we just have a subclass but the problem is here now the
00:14:48.480 class is part of this cache key this Loop down here at the bottom it oscillates between those instances it
00:14:55.019 calls Foo on hello then Foo on world and we keep doing that over and over but since class is the cache key it means we
00:15:01.740 can never hit we always look up hello then look up world and we're always missing the cache there I'm going to
00:15:07.800 throw in one more monkey wrench and that is uh how do we deal with undefined instance variables we all know like you
00:15:13.320 can access an undefined instance variable and it Returns the value nil like how do we deal how do we deal with
00:15:18.360 that so imagine we have this first case here where we're calling hello with true
00:15:23.940 true sets all three instance variables so we have one two and three stored in line in the object so those are all
00:15:30.360 those all come from these instances for these sets here but what about the second case uh where
00:15:36.839 we're calling it with false well what happens is when we allocate a new object we fill in all the memory with a magic
00:15:42.420 value and this magic value is Q on Def uh so this second instance will end up
00:15:48.000 with a layout like this where we have a one and then Q on Def and then the value three because we didn't set that middle
00:15:54.720 we didn't set that middle instance variable there so you can kind of imagine how the
00:16:00.240 implementation of instance variable defined works should be easy to figure this out we
00:16:05.339 just say well okay let's go look up the index of bar and then we go look up the value for bar and we see that that value
00:16:11.519 is this magic one called Q on Def so we're able to tell whether or not an instance variable has been defined we
00:16:17.279 can do that uh and then when we when we actually return this so we've cached the
00:16:22.800 value 0 and 1 here now the issue is again similarly to the previous problems well we can't return Q on Def we have to
00:16:29.279 return the value nil q and def's not a real value we need to return that so we can't just return one and then q and Def
00:16:36.240 so the where we handle that is we handle that inside of our our implementation of the get Ivar instruction I know this
00:16:43.139 code is getting smaller and that's on purpose I'm making a point it's getting more and more complicated we have to
00:16:49.380 handle all of these cases we have to check was this thing Q on Def if it was we got to return null otherwise we
00:16:55.500 return our we return the value inside of the array so we keep getting more and more complicated as we're adding these
00:17:01.620 like adding these special cases so just to recap the conditionals for reading an instance variable is the index set in
00:17:08.339 the cache to the classes match in the cache is it a type object is the IV
00:17:14.160 value equal to Q on Def and to drive home how how complicated this actually is I want to take a look at jit
00:17:20.100 compilation so the way the jit works from a high level is that when we've executed a
00:17:26.100 method enough times the jit will pause it'll say oh okay uh we're going to stop right here for a second and we're going
00:17:32.700 to iterate over each of the instructions from the virtual machine and we're going to generate corresponding machine code
00:17:38.220 for each of those instructions and then rather than letting the virtual machine execute the instructions it jumps into
00:17:44.580 the machine code and executes that machine code instead so let's take a look at the machine code
00:17:51.120 for reading an instance variable and I'm going to call out each part here the first thing we have to do is we have to
00:17:56.160 check well is this thing an object so we do we have to emit the machine code for that then the next thing we have to do
00:18:02.100 is say well does the class match whatever is in the cache we have to check that as well and then the next
00:18:07.860 thing we have to do is say well okay uh is it an embedded object or an extended object like is it is it are those three
00:18:15.000 Ivars that's stored inside the object itself we also have to check well is it Q on Def as well like if it's Q on Def
00:18:22.080 we got to return nil and then finally finally down at the bottom those last two instructions are read the Ivar and
00:18:28.620 push it onto the stack so we have 93 bytes of machine code for just reading one instance variable we
00:18:35.340 can actually do much better and that's where object shapes comes into play and no it's not these types of shapes in
00:18:41.100 fact there's really only one shape it is a tree data structure which is this shape so rather than like just
00:18:47.520 explaining it just saying oh it's a tree data structure we're going to try building it so we can kind of see what it looks like the tree data structure is
00:18:54.419 built every time we add an instance variable every time we write to something
00:19:00.740 let's move more quickly uh so when we write the value Foo we start out at a
00:19:07.320 root shape we write the value Foo and that value creates a new Edge in this tree and we cache the values from this
00:19:14.220 new node inside the inline cache so here we say we're writing value Foo the outgoing Edge food did not exist so
00:19:20.940 we'll write a new node with the outgoing Edge Foo and we'll cash that it came from the shape zero and it's going to
00:19:26.820 shape one and that we set the instance variable on the IV index 0. so we cache
00:19:31.980 these three values we do the same thing with bar ah okay yes our cache key our cache key is the shape that we came from
00:19:37.860 it is our originating shape so then we we also cached the destination shape as well as the IV
00:19:44.940 index so the next thing we do we do exactly the same thing with bar except now we're going from shape id1 to shape
00:19:50.880 ID 2 and I know this isn't very tree-like but it is a tree it's just a
00:19:55.919 linear very linear tree maybe a tree trunk I don't know all right so the second time we execute
00:20:02.700 this we don't actually have to consult the tree we just take a look at those cash Keys we say oh you're you're currently shape zero well I know shape
00:20:09.840 zero goes to shape one and it sets the IV on in index zero so we have a cache hit here we know we make that transition
00:20:15.660 we do the same thing for the second one so we have a cache hit cache hit there so these shapes form a graph and all the
00:20:23.700 objects in this graph start from all objects allocated start from this root shape so they start from the root and
00:20:29.160 they grow from that graph and shapes only change when we're doing rights also the shape ID is the cache key and
00:20:37.320 importantly that means that the class of the object is not the cache key we don't care about the class we only care about
00:20:44.039 the instance variables that were set and the order in which they were set so what's cool about this is that we're
00:20:50.160 able to share caches between subclasses so in this case here before we couldn't
00:20:55.200 get a cache hit in the subclass but now we can because we don't care we don't care about the types anymore we only
00:21:01.620 care about the IVs and the order they were set so we get to share those caches the other thing is that we're able to do
00:21:07.799 cross-type memory amortization so this this what I mean by that is that shape tree is shared between all instances so
00:21:15.539 we saw that like we had a we had a an IV index table for a super class and a subclass well here now we just have one
00:21:22.620 shape tree for all classes so we're able to amortize this across multiple multiple types uh we also get cross-type
00:21:30.000 cash hits which I'm going to show here again this is our problem before where we had the hello class in the world
00:21:35.880 class oscillating between the two and we couldn't get cash hits here but in the
00:21:41.400 world of shapes well in fact those two types have exactly the same shape this
00:21:46.559 shape so we're able to get get cash hits here where we couldn't before and if we do a very simple micro Benchmark on this
00:21:52.620 I know you can't read it I'm going to just write it bigger it is about 2.7 times faster this particular Benchmark
00:21:58.919 this is all impacted by being able to do cache hits where we couldn't do them before
00:22:04.740 Oh Boy 50 slides left all right so memory usage improvements classes store
00:22:12.059 their names as instance variables so if you do hello.name that class.name that's read as an instance variable John
00:22:18.720 Hawthorne was able to take convert classes to use object shapes as well he
00:22:23.940 sent a patch to do this and you can imagine like all classes have a
00:22:29.039 different class which is a meta class so of course we had to duplicate all this information among all classes but since
00:22:35.280 the classes can can use shapes now we know that all these classes have the same shape and we're able to amortize
00:22:41.640 that that cost and he found in his application this is I think this is GitHub github's application they were
00:22:48.360 able to save about 16 megabytes worth of memory just on this this one change so I think that's pretty pretty great
00:22:57.000 um to cover two more we're going to cover two more things and we're going to get back to the jit implementation not
00:23:02.460 all properties I was very careful to say properties I think I was hopefully not all properties of an object are instance
00:23:09.419 instance variables freezing the Frozen state of an object is also a property of
00:23:14.700 an object and we've encoded the Frozen state of an object into into the object as a as a shape transition so here's an
00:23:21.120 example this will get a little bit more tree-like so we'll do hello here which was our normal shape we saw before we'll
00:23:26.760 set Foo we'll set bar that's great and then we're going to call this other go
00:23:32.299 transitions go we're going to set this other value here so we'll have we'll have a we start off with shape two there
00:23:39.480 on the first instance we set another instance variable that goes to shape three so that's great now our second our
00:23:47.100 second instance we're going to set those two those two instance variables that'll be a cache hit in the tree and then we
00:23:52.860 freeze is it and we just create a new uh a new transition off of the bar
00:24:00.120 transition we say hey we're going to transition you to Frozen now now when we call set you can see here that we
00:24:06.419 actually have a cache Miss in this case so we'll try to call set and that's going to raise an exception because you
00:24:11.520 can't set an instance variable on a frozen object so in this case we're we're using shape four and we're trying
00:24:17.760 to we're trying to set an instance variable there but you can see that that will be a cache Miss in this case so
00:24:23.039 what's really cool about that is we can say well before we always had to check whether or not an object was frozen
00:24:28.559 before we could set set the set the instance variable but now we really only
00:24:33.659 need to do this Frozen status check when we do a cache Miss so we don't need to
00:24:38.760 do that check at runtime anymore well at not we can do it on Cache this time so Frozen checks only occur on cash
00:24:45.960 misses when we're using object shapes and I have another Benchmark here we're benchmarking the IV Wright performance
00:24:51.179 again it's too small to read but you can see that it is about 21 faster so IV rights get faster and this isn't taking
00:24:57.780 into account jit at all this is just normal VM execution so since we don't have to check check Frozen status
00:25:03.720 anymore we can set instance variables much faster so let's talk about jit performance now
00:25:09.240 to understand jit performance a bit better I want to talk about the layout of objects again now the object layouts
00:25:15.960 have all objects in Ruby have two fields that are the same it's the top two Fields there's a Flags field and then a
00:25:24.000 pointer to the class now the rest of the the rest of the slots in the object change depending on the type of object
00:25:30.059 that we're dealing with so a t object will have store instance variables an array will store array elements etc etc
00:25:36.419 so let's take a look at what the fields flag is the fields flag is a 64-bit integer on 64-bit platforms and this is
00:25:44.100 kind of the layout of that bit field which you can't read this those numbers are they're not important read in the
00:25:50.580 slides later uh the bottom five bits represent what type of object we're dealing with so that maps to the object
00:25:58.200 type and then we have seven more bits above it which are common to all objects
00:26:04.140 so these these bottom 12 bits are common to all objects in the system for example one of these bits represents whether or
00:26:10.740 not an object ID was seen so we lazily generate object IDs and we have to keep track of which objects have have had
00:26:17.640 object ID called on them so in this case down here we have an object.new.objectid
00:26:22.740 and an array.objectid either both of those whenever when object ID is called
00:26:28.620 will flip this one bit right here so we'll say Hey Okay somebody called object ID on that and it has the same
00:26:34.740 meaning across all types now the upper bits the meaning of those upper bits changes depending on the type of object
00:26:41.100 that we're dealing with in this case we're dealing with uh with regular T object objects and we have to keep a bit
00:26:47.100 with regard to whether or not the object is extended what that means is well you know we can
00:26:53.159 we were talking about three instance variables but clearly we can store more than three instance variables so how
00:26:58.200 does that work uh when we go to set the fourth instance variable we say well we can't set that like there's not room so
00:27:05.460 what we'll do is we'll allocate a buffer and then we'll store all the instance variables in the buffer and then we'll
00:27:10.620 set a pointer that points at that buffer and we have to be able to differentiate between objects that have a buffer and
00:27:15.960 objects that don't and that's what that particular bit is for so we have to check that bit so when the jit compiler runs oh come on
00:27:24.779 three minutes we got like 30 slides left I'm so sorry we're not
00:27:29.960 okay when the jit compiler runs we pause here and we have to check all the all of
00:27:35.460 the uh fields of the object we have to check like okay what is the type of the object is it embedded or extended is the
00:27:43.020 IV Q on Def uh is the class correct we have to check this at compile time and we have to essentially repeat those
00:27:49.440 checks for runtime because we have to make sure that the object we see at runtime matches up with the thing that
00:27:55.799 we saw at compile time if it doesn't match we need to exit the jit so the places where we have to do all those
00:28:01.380 checks are we have to check the headers field we have to check the object type we have to check whether or not it was
00:28:06.600 extended we have to check is it the right class we have to check if the value is q and F so we have to check all over the place and this is why the
00:28:13.620 machine code for reading in one instance variable is so difficult so long so we
00:28:18.900 can use shape IDs to eliminate those checks we store the shape ID in the upper 32 bits of the 64-bit pointer
00:28:26.840 talk to me later about 32-bit machines if you care we're not going to get into it
00:28:32.220 um so we know from looking at the previous slides that we don't need to check class
00:28:37.320 anymore we don't care about that shape IDs are independent of class so as long as we check the shape ID we're good so
00:28:43.980 we're just going to eliminate that check right away we don't care about that let's handle undefined variables
00:28:49.860 so we know that when we're building the shape tree for this object we end up with one particular shape like this so
00:28:56.700 we end up with shape three when the instance variable is defined so if we compile if we compile the foo method we
00:29:03.659 associate that compiled method along with shape ID3 now if we compile the
00:29:08.760 second one we know we end up with a separate shape so we try to transition off of Foo to set baz because we skipped
00:29:16.679 bar again we only care about the names of instance variables in the order so we have a different shape here so now the
00:29:23.039 jit compiler can say well I know I'm compiling for shape three inside of shape three the instance variable bar
00:29:29.220 exists I know for sure it exists because I'm looking at shape three now if it compiles this method for shape
00:29:36.240 four it's able to know well the instance variable bar it does not exist it is not set I know it's not set in here because
00:29:42.480 I'm looking at shape four it's not in the tree so we're able to use this shape ID as our cache key so we can say well we
00:29:48.899 don't care about checking whether or not an instance variable is q and Def or not now we can kind of do another trick here
00:29:54.480 to get rid of get rid of this bit field check on extended versus embedded so the
00:30:00.840 way we do that is we say well uh what I'm going to do is I'm going to create a new transition for everything that gets
00:30:06.659 embedded so we'll start out with an object like this maybe it only stores two instance variables I want to fit this on a slide
00:30:12.480 so we store those two instance variables and then we say well we can't store any more so we need to allocate an external
00:30:18.539 buffer what we're going to do is we're going to allocate we're going to create a magic like extended shape so we'll
00:30:24.480 create an extended shape here we'll insert the extended shape allocate an
00:30:29.580 external buffer store the instance variables in the external buffer and then finally create our create our shape
00:30:36.000 for that last that last instance variable so we end up with shape four now if we compare this to something
00:30:41.880 that's completely embedded we'll go along and say all right we're going to hit Foo we're going to hit bar and then
00:30:48.600 we don't need to extend this so we create a new shape off of bar for that particular that particular Edge baz so
00:30:55.260 we end up with a different shape depending on whether or not an object has been embedded or extended so we have
00:31:00.899 shape five down here at the bottom now the jit compiler can differentiate between these two types of objects so we
00:31:07.020 know that we we know that in this case shape 5 is associated with an object that's not extended and shape four is
00:31:13.679 associated with an extended object which means we don't really need to do this check anymore we just check the shape ID
00:31:19.500 so we have one more thing to get to tackle which is object types and what we do here is we just say well okay
00:31:27.179 over all right fine we'll be done with this shortly I'm sorry so what we'll do here is we'll say
00:31:33.419 well on on allocation time okay we have we have an issue here like let's say we say hello we fire off hello hello is
00:31:40.200 allocated shape three we do the same thing on array remember we only care about instance variable names and the
00:31:45.840 way in which they're ordered so these two would end up with the same shape ID which is a problem because we know that
00:31:51.299 different types store instance variables differently so the way we deal with this is fairly easy we just say okay well
00:31:58.860 when we allocate a new object type versus an array we say all right uh
00:32:04.860 we're going to start at the root shape but then we're going to immediately transition to a special shape type called T object so anytime we allocate a
00:32:12.659 ruby a regular Ruby object we'll immediately do a transition and then we'll base all of our shapes off of that
00:32:18.720 so we go through there like this and we end up with shape id4 right there
00:32:23.940 for arrays we don't do that immediate transition so we do all of our transitions based off of the root shape
00:32:29.880 and we end up with a different shape ID depending on the type of object that we're dealing with which means we're
00:32:35.460 able to eliminate this check two all we need to do is check the shape ID in the jit so the only thing that's required is
00:32:42.000 a is a shape ID check and this is able to reduce our jit compiled code down to
00:32:48.360 from the left to what we have on the right here all we have to do in the stuff on the right is say well let's
00:32:53.399 make sure it's the right shape ID if it's the right shape ID then we're just going to read the instance variable and
00:32:58.679 push that onto the stack so Benchmark comparison here we're doing we're measuring the cost of fetching an
00:33:04.500 instance variable inside of the jit this time we're running it with the jit compiler uh this particular Benchmark
00:33:10.500 we're comparing yjit before object shapes versus yjit after and we're about
00:33:15.600 45 faster when we use when we use the object shapes technique and I think 45
00:33:21.299 faster it's not really compressive enough I think so if we compare uh the jit compiler to regular Ruby it's 3.7
00:33:28.980 times faster before shapes after shapes it's now 5.4 times faster to read an
00:33:34.559 instance variable so in the future I'd like to use this technique possibly for reducing the size of our objects to 32
00:33:41.100 bytes uh tldr of this presentation is that object shapes lead us to fewer
00:33:46.500 checks and faster code I don't have time for questions thank you
00:33:54.539 thank you