00:00:00.000
ready for takeoff
00:00:16.920
oh my goodness you can't I get at least like six seconds back on
00:00:23.760
this clock that's not fair they started the clock before they switched the slides up here and I gotta use all of
00:00:29.760
these minutes thank you thank you
00:00:34.820
oh all right um did anybody else like did anybody else have a hard time with these
00:00:40.500
elevator buttons like when I got to when I got to the hotel I just I went to the elevator I'm
00:00:46.860
like I don't understand where the buttons are but one of the one of the doors is open I'll just go in and use that so I did that and then and then I
00:00:53.760
had to leave and meet meet people and I go to the elevator and I'm like uh I'm not I'm not I don't know how to use
00:01:00.360
these it took forever for me to figure out that these buttons are actually like I thought this was the emergency sign
00:01:05.400
which it is but I thought the buttons were like part of it um anyway
00:01:11.159
I don't know why I'm going on about this I just I don't have time uh so this this
00:01:16.320
talk is titled don't at me uh faster instance variables with object shapes oh
00:01:21.360
hold on a sec let me I gotta shut off the notifications here
00:01:26.580
okay okay sorry yeah I'm very excited to be in Houston I've
00:01:33.240
never been to Houston before so I'm really I'm really happy to be here the food is great this is a lot of fun
00:01:39.240
um my name is Aaron Patterson I usually put about like 15 minutes worth of stand up at the beginning of my presentations
00:01:45.600
but I just don't have time for that here so I cut all of those I'm really sorry um
00:01:51.240
I'm part of the Ruby quartz name I'm also on the rails core team uh my I go
00:01:57.659
by tenderlove everywhere online so you can find me you can find me on all the social media with with that handle uh
00:02:04.500
except for on LinkedIn I use my more professional name there which is also Tender Love uh
00:02:11.340
I work for a mom and pop e-commerce website called Shopify
00:02:16.680
um I'm on the Ruby infrastructure team at Shopify our team is working on like different projects to improve the
00:02:22.500
performance of Ruby as well as the quality of life of developers uh working working for Shopify and also in the Ruby
00:02:29.040
community at large uh I would say like our team's customers are essentially the development teams at Shopify so we're
00:02:35.340
making Ruby and rails better so that they can get their jobs done more quickly and with fewer resources we're
00:02:41.879
working on different projects like uh wide GC improvements like the variable
00:02:47.220
width allocation project and other infrastructure improvements as well so I'm here today to talk to you about
00:02:52.980
instance variables and how how they work um I was going to talk like call this
00:02:59.220
talk instance variables TMI because I am going to tell you way too much information about instance variables but
00:03:05.879
instead of just being like a pure fact-based mission here what I want to
00:03:10.980
do is I'm I want to derive the way that instance variables work so that hopefully we're going to implement them
00:03:16.620
together and hopefully you'll be able to come away with a deeper understanding of how how they work and why we make
00:03:22.019
different decisions with regards to Performance and optimization so I'm also going to be talking about
00:03:27.959
object shapes which is a technique that we use for speeding up Axis of instance variables as well as other things this
00:03:34.620
project has been ongoing at work my team's been working on it and it's going to be shipping along with the Ruby 3.2
00:03:39.959
release and I'm also going to be talking about how all of these things work together with a wide jit in order to
00:03:47.519
make instance variable access extremely fast and I'm going to do all of this in
00:03:53.700
30 minutes first off I want to say thanks to everyone on the Ruby infrastructure team
00:04:00.720
the YG team especially I've been working very closely with Gemma on this project and I also want to thank Maxine for her
00:04:07.799
guidance on the project as well and I also want to give a shout out to John Hawthorne at GitHub he's been helping he's been helping on this project as
00:04:14.040
well so there's been a lot of people working on this together so let's talk about how Ivars work uh this is a joke
00:04:21.600
for all the people from Seattle yes um very local joke all right so
00:04:30.000
just a kind of a note here I'm going to refer to them as instance variables also Ivars and also IVs those all mean the
00:04:37.680
same thing I just need to shorten it sometimes because I only have 26 minutes left so let's talk about implementing
00:04:44.340
instance variables let's say we have a very simple class like this with a couple instance variables on it if we
00:04:50.220
were implementing a language like how might we how might we store this data like how would we store it I think a
00:04:56.460
really a really simple way to accomplish this task would be to just store your instance variables in a hash table on
00:05:02.460
the instance so for example we'd have our instance of hello here and we'd say all right we got a hash table associated
00:05:08.639
with the instance of hello the key to the hash table is going to be the name of the instance variable and the value
00:05:13.860
the value in the hash table will be the the value of the instance variable uh and we can imagine writing this writing
00:05:20.820
this code is pretty easy we could imagine implementing it something like this where we just have that hash table
00:05:26.699
associated with all of our instances when you write something it writes the hash table when you read something it
00:05:32.580
reads from the hash table Etc like all these seems pretty easy to implement if we understand how hash tables work in
00:05:39.600
fact this is this is assistance variables were implemented in Ruby one eight and earlier that's how
00:05:46.259
they how they worked uh Ruby 180 earlier was implemented via tree walking interpreter and the way the tree walking
00:05:53.039
interpreter works is we would take we would take your code and turn it into a tree and then we would walk each node in
00:05:59.820
the tree and evaluate those nodes in the tree so here we have a very simple example this tree is representing the
00:06:05.940
code inside of the method Foo so we have Foo plus bar and the way it would work is we would
00:06:11.880
say okay we're gonna We want to evaluate that method or that plus node there but we can't evaluate it yet because we have
00:06:18.060
to evaluate its children so we evaluate foo foo does a hash lookup to get that
00:06:23.400
value one out and then bar also does a hash lookup to get the value 2 out and then we return those values back up the
00:06:30.300
tree they get returned up plus is able to execute add those two together and then return to the caller now 1.9 came
00:06:38.699
along Ruby 1.9 came along and introduced a virtual machine and what the virtual
00:06:44.460
machine did is it would compile all of your code into a byte code and execute execute that byte code I'm not going to
00:06:51.360
go into the compilation process because not much time uh but let's walk through
00:06:56.460
how the virtual machine might execute this so the virtual machine first it's going to take or the compiler is going
00:07:02.520
to take that Foo method and convert it into byte code so we'll have some byte code like this and it's going to walk
00:07:08.280
through each of those instructions one at a time and just execute them and as it's executing those instructions it's
00:07:13.860
going to manipulate a stack so we also have a stack right here so you imagine all right we're just going to Loop
00:07:19.139
through each of these execute them and then manipulate the stack the first thing we'll do is we'll say hey get Ivar here we'll push one onto the stack uh
00:07:26.880
get if our for bar we'll push two onto the stack and then when we execute the plus method plus is going to pop those
00:07:33.479
two values off the stack and then push the return value onto the stack so if we were implementing this virtual
00:07:40.740
machine we can imagine how one might implement the git get Ivar instruction it would be pretty simple just maybe a
00:07:47.699
method like this where we say hey I'm going to take the name the name is going to come from the instruction it's a parameter from the instruction and the
00:07:54.539
first thing I'm going to do is I'm going to look up self like what is the value what is the object that we're operating on and in this case self is going to be
00:08:01.380
stored on the current in the current frame it's the current thing that we're operating on then we're going to say hey
00:08:06.780
I want to get the the hash table of instance variables and then all I need to do is I just need to look up that
00:08:14.099
that value by name from the hash table and then just push that value onto the stack so it's pretty easy to imagine how
00:08:20.099
we could go from the tree walking interpreter into the into the virtual machine implementation
00:08:25.740
now the problem with this implementation is that hashes are slow when compared to arrays I don't want to like hashes
00:08:33.120
aren't that slow but if you compare it to an array yes it's low um hashes use a lot of memory when
00:08:38.399
compared to arrays because we have to the hash data structure uses a lot more room than an array would use so could we
00:08:45.180
use an array instead of a hash the answer is yes we could do that but let's imagine like how how might we Implement
00:08:51.360
something like that so here we have our simple class again hello with a couple instance variables
00:08:57.480
on it and when we what we can do is we can say all right well when we allocate this new instance of hello that instance
00:09:03.779
has to point at a class so it's going to point at the hello class what we can do is we can say hey class um do you have
00:09:11.820
an index for this instance variable so we'll go ahead up here we'll say hey I want to set the instance variable Foo do
00:09:18.540
you have an index for it and at first the class is like no I don't have an index for it I will make you one so it
00:09:24.540
inserts Foo into this hash table with an index of zero so we know that Foo maps to the index zero then we go ahead and
00:09:31.440
we set the value 1 at the zeroth index in the array that's stored on the
00:09:36.660
instance so our class is keeping a map of names to indexes indices so we'll do
00:09:43.320
the same thing for bar we'll set bar we'll say hey do you have an index for bar it doesn't so it creates a new one and then we store the value 2 in the in
00:09:50.399
that one element index uh so when we do this a second time so
00:09:56.100
let's say we perform this on a second instance we're still running the program um our new instance will just say hey I
00:10:03.000
want to set Foo and bar those already exist in that hash table so we don't need to like we don't need to create new
00:10:09.360
indices it just uses those and sets those in the array now you might be looking at this and
00:10:15.240
thinking well you know you said oh hey we have a half table uses memory don't like what's the deal we still have a hash table here that is true yes but we
00:10:23.760
have we're able to use amortize the cost of this hash table across multiple instances of hello so we can say well
00:10:30.060
now we're able instead of having a hash table per instance now we have one that's associated with the class and all
00:10:36.420
of these instances are able to take advantage of the fact that that is stored on the class instead of the instance now
00:10:43.080
going from this I want to talk about object layout a little bit because this is going to be important when we talk
00:10:48.300
about the jit compiler the objects are laid out we'll say that objects are 40 bytes wide some objects are wider in the
00:10:55.980
new version of Ruby 3.2 but we're going to consider 40 byte objects and uh that
00:11:01.740
allows us to set three instance variables inside the object itself so we
00:11:06.959
can store three instance variables in line in the object so in this particular case we're going to store the values 1 and 2 inside that array we'll treat the
00:11:13.920
object itself as the array now the other thing I want to point out
00:11:19.079
before we get to talking about jit compilation is that the place where we store instance variables is different
00:11:25.440
depending on the type of object that we're dealing with so on the left here we just have a normal plain old Ruby
00:11:31.200
object and we set instance variables on it on the right we have a subclass of array where we store instance variables
00:11:36.959
the the storage location of those instance variables changes depending on the type so we care about the type
00:11:45.660
um so let's revisit the instruction implementation now we have to take into account when we're looking up the
00:11:50.820
instance variable hey we got to look at a class like give me the class for the self so first off we have to ask for the
00:11:56.880
class then we ask the class for the index and then we're able to look up the instance variable based on the index and
00:12:03.000
remember we have an if statement there to check whether we're looking it up on an object or versus some other thing but
00:12:09.660
unfortunately you might notice oh geez we're still doing a hash look up here and you said hashes are slow so we need
00:12:15.540
to eliminate that and this is where we get into inline caches where we should be able to Cache that hash look up and
00:12:22.019
we can do it inside of what is called an inline cache these are simply cache objects that are stored in line with
00:12:28.200
your byte code so we'll say here we're going to add a new parameter inside of the byte code which is this magic cache
00:12:34.560
object right there is the magic cache object and what we'll do is we'll cache
00:12:39.600
the index inside of that that object so we'll say hey if we don't have an end
00:12:44.700
decks cached already please go look up the index and then store it inside of the cache if we then below here we just
00:12:51.720
use the cached index so if there is one cached we'll use it otherwise we go look it up what this means is that there's
00:12:57.360
usually no hash lookups the first time we execute it yes we have to do it but subsequent times we don't have to do
00:13:02.639
that hash lookup so we're able to eliminate it however there's a problem with this uh remember that our name to
00:13:09.060
index mapping is per class so when we create the hello class we've mapped Foo and bar to zero and one and we have an
00:13:15.660
instance there as well but when we execute the foo method we're going to
00:13:21.240
store the indices 0 and 1 inside of the inline cache for that Foo method to look up look up the instance variable zero
00:13:27.420
and one now the problem is like what about this world class down here World
00:13:32.700
sets an instance variable it is a world-class class
00:13:40.519
sorry I'm going to chew up a few seconds for at least one pun um
00:13:45.660
all right so we have a world-class class world class here it's it inherits from
00:13:51.120
Hello it sets an instance variable before it calls super so you can imagine what the index table looks like on the
00:13:57.300
world class we've set hello we've set oops first in the world class but we've cached the
00:14:03.660
value zero and one in the foo method so what happens when we call the foo method well we're going to look up the wrong
00:14:08.940
values we're going to look up the values for oops and we're going to look up the values for Foo rather than Foon bar which is what we expected so this is
00:14:15.660
going to raise an exception and blow up so we can fix this problem it's not a big deal we just use a class as a cache
00:14:22.139
key as well we say all right uh the class needs to match we need to have an index we also need to have an index set
00:14:29.160
and then we can finally use that so our our instruction is getting a little bit more complicated now we have another
00:14:35.220
problem where I keep throwing in problems here let's say we have an empty hello empty world class down at the
00:14:41.820
bottom we have changed nothing we just have a subclass but the problem is here now the
00:14:48.480
class is part of this cache key this Loop down here at the bottom it oscillates between those instances it
00:14:55.019
calls Foo on hello then Foo on world and we keep doing that over and over but since class is the cache key it means we
00:15:01.740
can never hit we always look up hello then look up world and we're always missing the cache there I'm going to
00:15:07.800
throw in one more monkey wrench and that is uh how do we deal with undefined instance variables we all know like you
00:15:13.320
can access an undefined instance variable and it Returns the value nil like how do we deal how do we deal with
00:15:18.360
that so imagine we have this first case here where we're calling hello with true
00:15:23.940
true sets all three instance variables so we have one two and three stored in line in the object so those are all
00:15:30.360
those all come from these instances for these sets here but what about the second case uh where
00:15:36.839
we're calling it with false well what happens is when we allocate a new object we fill in all the memory with a magic
00:15:42.420
value and this magic value is Q on Def uh so this second instance will end up
00:15:48.000
with a layout like this where we have a one and then Q on Def and then the value three because we didn't set that middle
00:15:54.720
we didn't set that middle instance variable there so you can kind of imagine how the
00:16:00.240
implementation of instance variable defined works should be easy to figure this out we
00:16:05.339
just say well okay let's go look up the index of bar and then we go look up the value for bar and we see that that value
00:16:11.519
is this magic one called Q on Def so we're able to tell whether or not an instance variable has been defined we
00:16:17.279
can do that uh and then when we when we actually return this so we've cached the
00:16:22.800
value 0 and 1 here now the issue is again similarly to the previous problems well we can't return Q on Def we have to
00:16:29.279
return the value nil q and def's not a real value we need to return that so we can't just return one and then q and Def
00:16:36.240
so the where we handle that is we handle that inside of our our implementation of the get Ivar instruction I know this
00:16:43.139
code is getting smaller and that's on purpose I'm making a point it's getting more and more complicated we have to
00:16:49.380
handle all of these cases we have to check was this thing Q on Def if it was we got to return null otherwise we
00:16:55.500
return our we return the value inside of the array so we keep getting more and more complicated as we're adding these
00:17:01.620
like adding these special cases so just to recap the conditionals for reading an instance variable is the index set in
00:17:08.339
the cache to the classes match in the cache is it a type object is the IV
00:17:14.160
value equal to Q on Def and to drive home how how complicated this actually is I want to take a look at jit
00:17:20.100
compilation so the way the jit works from a high level is that when we've executed a
00:17:26.100
method enough times the jit will pause it'll say oh okay uh we're going to stop right here for a second and we're going
00:17:32.700
to iterate over each of the instructions from the virtual machine and we're going to generate corresponding machine code
00:17:38.220
for each of those instructions and then rather than letting the virtual machine execute the instructions it jumps into
00:17:44.580
the machine code and executes that machine code instead so let's take a look at the machine code
00:17:51.120
for reading an instance variable and I'm going to call out each part here the first thing we have to do is we have to
00:17:56.160
check well is this thing an object so we do we have to emit the machine code for that then the next thing we have to do
00:18:02.100
is say well does the class match whatever is in the cache we have to check that as well and then the next
00:18:07.860
thing we have to do is say well okay uh is it an embedded object or an extended object like is it is it are those three
00:18:15.000
Ivars that's stored inside the object itself we also have to check well is it Q on Def as well like if it's Q on Def
00:18:22.080
we got to return nil and then finally finally down at the bottom those last two instructions are read the Ivar and
00:18:28.620
push it onto the stack so we have 93 bytes of machine code for just reading one instance variable we
00:18:35.340
can actually do much better and that's where object shapes comes into play and no it's not these types of shapes in
00:18:41.100
fact there's really only one shape it is a tree data structure which is this shape so rather than like just
00:18:47.520
explaining it just saying oh it's a tree data structure we're going to try building it so we can kind of see what it looks like the tree data structure is
00:18:54.419
built every time we add an instance variable every time we write to something
00:19:00.740
let's move more quickly uh so when we write the value Foo we start out at a
00:19:07.320
root shape we write the value Foo and that value creates a new Edge in this tree and we cache the values from this
00:19:14.220
new node inside the inline cache so here we say we're writing value Foo the outgoing Edge food did not exist so
00:19:20.940
we'll write a new node with the outgoing Edge Foo and we'll cash that it came from the shape zero and it's going to
00:19:26.820
shape one and that we set the instance variable on the IV index 0. so we cache
00:19:31.980
these three values we do the same thing with bar ah okay yes our cache key our cache key is the shape that we came from
00:19:37.860
it is our originating shape so then we we also cached the destination shape as well as the IV
00:19:44.940
index so the next thing we do we do exactly the same thing with bar except now we're going from shape id1 to shape
00:19:50.880
ID 2 and I know this isn't very tree-like but it is a tree it's just a
00:19:55.919
linear very linear tree maybe a tree trunk I don't know all right so the second time we execute
00:20:02.700
this we don't actually have to consult the tree we just take a look at those cash Keys we say oh you're you're currently shape zero well I know shape
00:20:09.840
zero goes to shape one and it sets the IV on in index zero so we have a cache hit here we know we make that transition
00:20:15.660
we do the same thing for the second one so we have a cache hit cache hit there so these shapes form a graph and all the
00:20:23.700
objects in this graph start from all objects allocated start from this root shape so they start from the root and
00:20:29.160
they grow from that graph and shapes only change when we're doing rights also the shape ID is the cache key and
00:20:37.320
importantly that means that the class of the object is not the cache key we don't care about the class we only care about
00:20:44.039
the instance variables that were set and the order in which they were set so what's cool about this is that we're
00:20:50.160
able to share caches between subclasses so in this case here before we couldn't
00:20:55.200
get a cache hit in the subclass but now we can because we don't care we don't care about the types anymore we only
00:21:01.620
care about the IVs and the order they were set so we get to share those caches the other thing is that we're able to do
00:21:07.799
cross-type memory amortization so this this what I mean by that is that shape tree is shared between all instances so
00:21:15.539
we saw that like we had a we had a an IV index table for a super class and a subclass well here now we just have one
00:21:22.620
shape tree for all classes so we're able to amortize this across multiple multiple types uh we also get cross-type
00:21:30.000
cash hits which I'm going to show here again this is our problem before where we had the hello class in the world
00:21:35.880
class oscillating between the two and we couldn't get cash hits here but in the
00:21:41.400
world of shapes well in fact those two types have exactly the same shape this
00:21:46.559
shape so we're able to get get cash hits here where we couldn't before and if we do a very simple micro Benchmark on this
00:21:52.620
I know you can't read it I'm going to just write it bigger it is about 2.7 times faster this particular Benchmark
00:21:58.919
this is all impacted by being able to do cache hits where we couldn't do them before
00:22:04.740
Oh Boy 50 slides left all right so memory usage improvements classes store
00:22:12.059
their names as instance variables so if you do hello.name that class.name that's read as an instance variable John
00:22:18.720
Hawthorne was able to take convert classes to use object shapes as well he
00:22:23.940
sent a patch to do this and you can imagine like all classes have a
00:22:29.039
different class which is a meta class so of course we had to duplicate all this information among all classes but since
00:22:35.280
the classes can can use shapes now we know that all these classes have the same shape and we're able to amortize
00:22:41.640
that that cost and he found in his application this is I think this is GitHub github's application they were
00:22:48.360
able to save about 16 megabytes worth of memory just on this this one change so I think that's pretty pretty great
00:22:57.000
um to cover two more we're going to cover two more things and we're going to get back to the jit implementation not
00:23:02.460
all properties I was very careful to say properties I think I was hopefully not all properties of an object are instance
00:23:09.419
instance variables freezing the Frozen state of an object is also a property of
00:23:14.700
an object and we've encoded the Frozen state of an object into into the object as a as a shape transition so here's an
00:23:21.120
example this will get a little bit more tree-like so we'll do hello here which was our normal shape we saw before we'll
00:23:26.760
set Foo we'll set bar that's great and then we're going to call this other go
00:23:32.299
transitions go we're going to set this other value here so we'll have we'll have a we start off with shape two there
00:23:39.480
on the first instance we set another instance variable that goes to shape three so that's great now our second our
00:23:47.100
second instance we're going to set those two those two instance variables that'll be a cache hit in the tree and then we
00:23:52.860
freeze is it and we just create a new uh a new transition off of the bar
00:24:00.120
transition we say hey we're going to transition you to Frozen now now when we call set you can see here that we
00:24:06.419
actually have a cache Miss in this case so we'll try to call set and that's going to raise an exception because you
00:24:11.520
can't set an instance variable on a frozen object so in this case we're we're using shape four and we're trying
00:24:17.760
to we're trying to set an instance variable there but you can see that that will be a cache Miss in this case so
00:24:23.039
what's really cool about that is we can say well before we always had to check whether or not an object was frozen
00:24:28.559
before we could set set the set the instance variable but now we really only
00:24:33.659
need to do this Frozen status check when we do a cache Miss so we don't need to
00:24:38.760
do that check at runtime anymore well at not we can do it on Cache this time so Frozen checks only occur on cash
00:24:45.960
misses when we're using object shapes and I have another Benchmark here we're benchmarking the IV Wright performance
00:24:51.179
again it's too small to read but you can see that it is about 21 faster so IV rights get faster and this isn't taking
00:24:57.780
into account jit at all this is just normal VM execution so since we don't have to check check Frozen status
00:25:03.720
anymore we can set instance variables much faster so let's talk about jit performance now
00:25:09.240
to understand jit performance a bit better I want to talk about the layout of objects again now the object layouts
00:25:15.960
have all objects in Ruby have two fields that are the same it's the top two Fields there's a Flags field and then a
00:25:24.000
pointer to the class now the rest of the the rest of the slots in the object change depending on the type of object
00:25:30.059
that we're dealing with so a t object will have store instance variables an array will store array elements etc etc
00:25:36.419
so let's take a look at what the fields flag is the fields flag is a 64-bit integer on 64-bit platforms and this is
00:25:44.100
kind of the layout of that bit field which you can't read this those numbers are they're not important read in the
00:25:50.580
slides later uh the bottom five bits represent what type of object we're dealing with so that maps to the object
00:25:58.200
type and then we have seven more bits above it which are common to all objects
00:26:04.140
so these these bottom 12 bits are common to all objects in the system for example one of these bits represents whether or
00:26:10.740
not an object ID was seen so we lazily generate object IDs and we have to keep track of which objects have have had
00:26:17.640
object ID called on them so in this case down here we have an object.new.objectid
00:26:22.740
and an array.objectid either both of those whenever when object ID is called
00:26:28.620
will flip this one bit right here so we'll say Hey Okay somebody called object ID on that and it has the same
00:26:34.740
meaning across all types now the upper bits the meaning of those upper bits changes depending on the type of object
00:26:41.100
that we're dealing with in this case we're dealing with uh with regular T object objects and we have to keep a bit
00:26:47.100
with regard to whether or not the object is extended what that means is well you know we can
00:26:53.159
we were talking about three instance variables but clearly we can store more than three instance variables so how
00:26:58.200
does that work uh when we go to set the fourth instance variable we say well we can't set that like there's not room so
00:27:05.460
what we'll do is we'll allocate a buffer and then we'll store all the instance variables in the buffer and then we'll
00:27:10.620
set a pointer that points at that buffer and we have to be able to differentiate between objects that have a buffer and
00:27:15.960
objects that don't and that's what that particular bit is for so we have to check that bit so when the jit compiler runs oh come on
00:27:24.779
three minutes we got like 30 slides left I'm so sorry we're not
00:27:29.960
okay when the jit compiler runs we pause here and we have to check all the all of
00:27:35.460
the uh fields of the object we have to check like okay what is the type of the object is it embedded or extended is the
00:27:43.020
IV Q on Def uh is the class correct we have to check this at compile time and we have to essentially repeat those
00:27:49.440
checks for runtime because we have to make sure that the object we see at runtime matches up with the thing that
00:27:55.799
we saw at compile time if it doesn't match we need to exit the jit so the places where we have to do all those
00:28:01.380
checks are we have to check the headers field we have to check the object type we have to check whether or not it was
00:28:06.600
extended we have to check is it the right class we have to check if the value is q and F so we have to check all over the place and this is why the
00:28:13.620
machine code for reading in one instance variable is so difficult so long so we
00:28:18.900
can use shape IDs to eliminate those checks we store the shape ID in the upper 32 bits of the 64-bit pointer
00:28:26.840
talk to me later about 32-bit machines if you care we're not going to get into it
00:28:32.220
um so we know from looking at the previous slides that we don't need to check class
00:28:37.320
anymore we don't care about that shape IDs are independent of class so as long as we check the shape ID we're good so
00:28:43.980
we're just going to eliminate that check right away we don't care about that let's handle undefined variables
00:28:49.860
so we know that when we're building the shape tree for this object we end up with one particular shape like this so
00:28:56.700
we end up with shape three when the instance variable is defined so if we compile if we compile the foo method we
00:29:03.659
associate that compiled method along with shape ID3 now if we compile the
00:29:08.760
second one we know we end up with a separate shape so we try to transition off of Foo to set baz because we skipped
00:29:16.679
bar again we only care about the names of instance variables in the order so we have a different shape here so now the
00:29:23.039
jit compiler can say well I know I'm compiling for shape three inside of shape three the instance variable bar
00:29:29.220
exists I know for sure it exists because I'm looking at shape three now if it compiles this method for shape
00:29:36.240
four it's able to know well the instance variable bar it does not exist it is not set I know it's not set in here because
00:29:42.480
I'm looking at shape four it's not in the tree so we're able to use this shape ID as our cache key so we can say well we
00:29:48.899
don't care about checking whether or not an instance variable is q and Def or not now we can kind of do another trick here
00:29:54.480
to get rid of get rid of this bit field check on extended versus embedded so the
00:30:00.840
way we do that is we say well uh what I'm going to do is I'm going to create a new transition for everything that gets
00:30:06.659
embedded so we'll start out with an object like this maybe it only stores two instance variables I want to fit this on a slide
00:30:12.480
so we store those two instance variables and then we say well we can't store any more so we need to allocate an external
00:30:18.539
buffer what we're going to do is we're going to allocate we're going to create a magic like extended shape so we'll
00:30:24.480
create an extended shape here we'll insert the extended shape allocate an
00:30:29.580
external buffer store the instance variables in the external buffer and then finally create our create our shape
00:30:36.000
for that last that last instance variable so we end up with shape four now if we compare this to something
00:30:41.880
that's completely embedded we'll go along and say all right we're going to hit Foo we're going to hit bar and then
00:30:48.600
we don't need to extend this so we create a new shape off of bar for that particular that particular Edge baz so
00:30:55.260
we end up with a different shape depending on whether or not an object has been embedded or extended so we have
00:31:00.899
shape five down here at the bottom now the jit compiler can differentiate between these two types of objects so we
00:31:07.020
know that we we know that in this case shape 5 is associated with an object that's not extended and shape four is
00:31:13.679
associated with an extended object which means we don't really need to do this check anymore we just check the shape ID
00:31:19.500
so we have one more thing to get to tackle which is object types and what we do here is we just say well okay
00:31:27.179
over all right fine we'll be done with this shortly I'm sorry so what we'll do here is we'll say
00:31:33.419
well on on allocation time okay we have we have an issue here like let's say we say hello we fire off hello hello is
00:31:40.200
allocated shape three we do the same thing on array remember we only care about instance variable names and the
00:31:45.840
way in which they're ordered so these two would end up with the same shape ID which is a problem because we know that
00:31:51.299
different types store instance variables differently so the way we deal with this is fairly easy we just say okay well
00:31:58.860
when we allocate a new object type versus an array we say all right uh
00:32:04.860
we're going to start at the root shape but then we're going to immediately transition to a special shape type called T object so anytime we allocate a
00:32:12.659
ruby a regular Ruby object we'll immediately do a transition and then we'll base all of our shapes off of that
00:32:18.720
so we go through there like this and we end up with shape id4 right there
00:32:23.940
for arrays we don't do that immediate transition so we do all of our transitions based off of the root shape
00:32:29.880
and we end up with a different shape ID depending on the type of object that we're dealing with which means we're
00:32:35.460
able to eliminate this check two all we need to do is check the shape ID in the jit so the only thing that's required is
00:32:42.000
a is a shape ID check and this is able to reduce our jit compiled code down to
00:32:48.360
from the left to what we have on the right here all we have to do in the stuff on the right is say well let's
00:32:53.399
make sure it's the right shape ID if it's the right shape ID then we're just going to read the instance variable and
00:32:58.679
push that onto the stack so Benchmark comparison here we're doing we're measuring the cost of fetching an
00:33:04.500
instance variable inside of the jit this time we're running it with the jit compiler uh this particular Benchmark
00:33:10.500
we're comparing yjit before object shapes versus yjit after and we're about
00:33:15.600
45 faster when we use when we use the object shapes technique and I think 45
00:33:21.299
faster it's not really compressive enough I think so if we compare uh the jit compiler to regular Ruby it's 3.7
00:33:28.980
times faster before shapes after shapes it's now 5.4 times faster to read an
00:33:34.559
instance variable so in the future I'd like to use this technique possibly for reducing the size of our objects to 32
00:33:41.100
bytes uh tldr of this presentation is that object shapes lead us to fewer
00:33:46.500
checks and faster code I don't have time for questions thank you
00:33:54.539
thank you