BathRuby 2016

Summarized using AI

How are Method Calls Formed?

Aaron Patterson • March 11, 2016 • Bath, UK

In the presentation "How are Method Calls Formed?" by Aaron Patterson at BathRuby 2016, the intricacies of Ruby's method calls and optimizations are explored. The session begins with a humorous recap of previous presentations and insights about cultural quirks, leading into Patterson's background working with Red Hat and the focus of his talk.

The main topics covered in the presentation include:

- Understanding Method Calls: The definition and structure of method calls in Ruby, along with their formation within the Ruby Virtual Machine (VM).
- Bytecode Execution: An overview of the bytecode generated by Ruby's VM, which operates as a stack-based machine, pushing and popping operations from a stack to execute methods.
- Method Lookups: Discussion on how Ruby looks up method definitions, traversing through class hierarchies, and why this is a crucial aspect of method execution.
- Performance Optimization Techniques: Factors affecting method call performance, especially caching strategies. The talk introduces concepts such as inline method caches and how they can speed up method calls by storing the results of method lookups to avoid repeated calculations.
- Cache Misses and Their Impact: Patterson addresses issues like cache invalidation and the implications of altering class definitions during runtime, which can lead to performance degradations due to cache misses.
- Polymorphism: He suggests strategies utilizing polymorphism to optimize method dispatch further, illustrating ways to conduct optimizations that respect the performance needs of Ruby applications.
- Conclusion on Learning from Failure: The presentation rounds off with the framing of learning through failures, encouraging attendees to engage with Ruby's source code on GitHub to better understand these optimizations and their applications.

Overall, attendees are encouraged to measure optimizations, embrace polymorphism, and approach method performance with analytical inquiries. The presentation is characterized by a blend of humor, technical deep dives, and practical code examples, aimed at providing developers with insights into Ruby's internals and best practices for performance optimization.

How are Method Calls Formed?
Aaron Patterson • March 11, 2016 • Bath, UK

How are Method Calls Formed? by Aaron Patterson

In this presentation we're going to study how method calls are executed. We'll go from bytecode created by Ruby's Virtual Machine down to the C code where the methods actually get executed.

After we've learned about how Ruby executes methods today, we'll dive in to optimizations we can make on method dispatch including various types of inline method caching. You should leave with a better understanding of Ruby's VM internals as well as ways to analyze and optimize your own code. Also the presentation will have really great transitions in Keynote.

BathRuby 2016

00:00:17.180 So last time Phil was my MC, I lost my passport in Belgium. I don't blame him, but if I did, it would be his fault. No, I'm just kidding. Okay, so we're going to make this better.
00:00:25.680 This slide is for a different audience, so I want to do a little bit of a conference recap here. I want to recap all the presentations that we've seen so far today. So I'm going to do that very quickly. Xavier talked about music to my ears; Coraline talked about U4J, which was great. I did not know that they used Java in The Matrix. I really enjoyed Courtney's talk about open source for my benefit. Apparently, I've been doing open source wrong.
00:00:43.110 Janet's talk was great, but she gave me the impossible task of coming up with a pun about gender inequality. But I think I did it: cognitive bias, you think? And finally, Zack, if you don't like these puns, you can feel free to brick it.
00:01:14.380 I was really honored to be invited to BathRuby 2016. I don't really know anything about Rocky, so instead, I'm going to talk about programming. Some of my slides do have color in them, but we'll see how we fare. We're going to talk about how methods are formed today. I want to talk... my alternative title for this was 'Method Mania.' But first, I want to introduce myself a little bit. I work for a company called Red Hat. I don't know if you've heard of them; they're a small startup in the United States.
00:01:55.000 Specifically, I am on the ManageIQ team, and our team at Red Hat manages cloud stuff. If you have cloud stuff, we can manage your cloud stuff! And actually, our application is open source, and you can look at the source on GitHub. It's just ManageIQ on GitHub, but I'm going to talk about that a little bit more later. Also, we're hiring!
00:02:30.430 So my name is Aaron Patterson, and I'm here to bring you freedom. I have to tell you, I gave a talk in Moscow and used this exact same joke, and it went over extremely well! I'm still alive. Anyway, I like to pay attention to the news, and I've noticed that, with the recent political climate here in Europe, I think this might actually be better: I'm from the European Union, bringing you freedom. That was supposed to be a Belgian waffle, but as an American, I have no idea what those look like.
00:03:00.690 Anyway, I'm really honored to be here. I'm excited to be in England because it's really cool that everybody here has an accent. It's also neat—I feel like I'm in some sort of romantic comedy where I've been whisked away to England and everything's like a dream. Another reason I like coming to England is because I don't feel so bad using freedom units here. For those of you who don't know, that means if you might refer to something as 10 degrees Celsius, where we would say Fahrenheit.
00:04:12.150 We don't use Celsius in the United States because Celsius stands for communism, and Fahrenheit stands for freedom. So, that's what we do there. Anyway, so I don't feel so bad using those units here. I truly am very honored to be here. I'm so happy to be in Bath, the city is beautiful, and I love coming to England.
00:04:33.960 I tried to come up with an emoji for this conference, and this is the best I could come up with. There's nothing that says 'Bath' and then 'root B,' so 'Bath through B.' Now, many of you might not know this, but there was actually a very important computer science discovery made right here. In fact, you might not notice this, but I was researching this on Wikipedia. One of the most famous sorting algorithms was invented here at Bath College by a man named Alexander Graham Bubble. Now, Bubble invented the bubble sort, and that's why the bubble sort was actually invented in Bath. Get it?
00:05:19.500 How much time do we have? Also, another weird thing I noticed is that my hotel room only has a shower, so I thought that was kind of weird, like I thought you're supposed to have a bath there, too. So, to get here, I took a train, and I just feel like the entire country of England is playing a joke on me because I went to purchase a ticket to get here. I went to the window and asked, 'Yes, I need a ticket to Bath Spa. Is that a valid place name?' They said, 'Yes, your ticket to Bath Spa.' They didn't think it was so weird at all, but to me, it was just very strange.
00:05:51.680 Last night, we were talking about how there are two different pronunciations: bath and bah. There were people arguing over which is the right way to say it, and I could not tell the difference. I'm like, 'You're both saying the same thing; it doesn't make sense! Anyway, so I love cats. This is my cat at Sea-Tac Airport; her name is Facebook YouTube. That's her full name; we just call her Tutu for short.
00:06:49.250 And this is also her; she likes to sit on my desk, which I think is very cute, but also gets in the way a lot. And then, this is the other one; he's a bit more famous. His name is Gorbachev Gorbachev Puff-puff Thunderhorse—that's his full name. I actually wanted, when my wife and I got married, to change our last names to Thunderhorse. I think it's awesome, but she would not have it, so that's too bad.
00:07:23.310 Anyway, my wife really wanted me to give a TED Talk someday, so she made this slide for me, but I don't know anybody named Ted, so I can't do that. Also, I enjoy hugs, and today is Friday, so I would like, I'm not going to make you all do it now, but tonight after this presentation, please come and give me a hug to celebrate Friday. Now, let's talk about methods. Let's talk about method optimizations.
00:07:52.710 So, I want to talk about methods, types of methods, how methods work, bytecode for methods, and some VM internals. Then, after we talk about how methods work, I want to discuss method optimizations, including inline caches and polymorphic methods. I hope you're all very proud of me because you may have noticed that I localized here for all of you. I'm trying to lay all this out for you so that you know there is a method to my madness.
00:08:43.410 More localization right there! This is a highly technical presentation, and that's how we're going to end the day today. However, I want to start out on a not-so-technical note. At the beginning, I usually give very technical presentations, but I want to start with a little bit of serious stuff that's not so technical. I want to give some advice for new people in the audience, people who are new to programming or new to the industry, and also those of you who have been here for a while.
00:09:17.820 I want you to kind of listen in. I want to talk a little bit about accessibility. This presentation I'm giving will cover some very complicated topics, but I'll build on easy stuff at the beginning and then get into more difficult stuff as we go on. My goal with this presentation is to ensure that everybody in the audience can get something out of it. Yes, there will be very difficult concepts introduced, but we'll introduce those slowly so hopefully, some new developers here can grasp that information as well.
00:10:21.130 The other thing I want to express, especially to new people, is don't be embarrassed about asking questions about this stuff. Some of this content is very difficult, and I remember a time in my career when I was afraid to ask questions of my colleagues because I thought, 'Hey, if I ask this question, am I going to look stupid?' I believe it is very important, especially for newcomers, to not feel that way.
00:10:43.590 I want to foster an environment where you feel okay asking questions about anything being shown or in the industry because all of us had to learn that stuff at some point. So you shouldn't feel embarrassed if you don't know something. The other piece of advice I have for you is to be genuine about what you ask. Be genuine in trying to learn something. If you ask questions of someone and you're genuinely trying to learn, and they think you're stupid for doing that, you probably don't want to work with those people.
00:11:01.370 So just a pro tip for all of you. If you notice that someone really doesn't understand something and they're being genuine about it, please help them out and make them feel comfortable. We'll talk about some very high-level stuff, oscillating between high-level and low-level concepts. We're going to dig deeper, then come back up for air from time to time.
00:11:48.480 I'm excited to give this presentation because this is the first time I've been able to talk about these topics. I want you to forgive me a little bit, as this is the first time I’ve given this particular presentation. It might be a little rough around the edges, but the first thing I want you to know is that, for those of you who don't want to watch the rest of this presentation...I failed. That's basically it. We'll talk about that failure at the end, and you'll understand how I was a failure.
00:12:12.820 But I want to talk about the things I learned along the way. The first thing we're going to cover is call sites. Call sites and programs are easy to identify in your code; they tend to look something like this: anywhere you see a dot is going to be a call site. I'll refer to the left side as the left-hand side of that call site and the other side as the right-hand side. An important thing about call sites is that they are unique.
00:12:51.630 If I had that 'hello world' listed five times, we'd have five different call sites. I want to show you some more examples in Ruby. Here is a short code snippet showing call sites. We have one at the top; we can see another one where we say object.new. Our left-hand side is the object class, and our right-hand side is 'new'. There's another example where we have an implicit left-hand side, which is 'self', and it's actually 'self.foo'. We can omit 'self.' and in that case, it's actually 'main'.
00:13:07.830 When you write a quick Ruby script, if you put 'self' in there, you'll see that it says 'main', so that's our main object. There are a few more examples. The left-hand side: good job, Aaron! You got it right there. Now, I have to remember I'm trying to break the sixth wall.
00:13:39.850 Are we doing the sixth wall? I'm breaking the sixth wall! Hello, viewers at home! Should I say hi, Mom? Alright, so we got some more code at the bottom. There’s one more here where the object is equivalent to doing object.triple_equals x. You could write that out the same way as object.triple_equals x, but we're going to have some more notes about that; that's an interesting one. And of course, we have another one here, double equals, that's a method call; we also have another here, this is an implicit call to self.
00:14:05.490 The point I'm trying to make here is that we actually have tons of these; they're all over our code. You may not necessarily notice them, but they're definitely there throughout our code base. Let's do a quick review of how Ruby's virtual machine works. The way Ruby's VM works is that it's a stack-based virtual machine. What that means is we have a list of instructions and a stack that the virtual machine works on. It takes those instructions and uses that stack to work out values.
00:14:42.290 For example, we have this tiny little program. The way it works is we push 6 onto the stack, and then we push 8. So now 6 and 8 are both on the stack, and when we say 'add', what that does is pop both of those values off, adds them together, and then pushes the value back onto the stack. This is how a stack-based machine works, and it's how MRI's virtual machine operates. What's interesting about the bytecode that MRI uses is that it's actually just stored as an array.
00:15:24.560 If you look internally, the bytecode looks something like this. If I were to write it in Ruby, it would show that it's just an array with some more arrays inside that array. It's basically a two-dimensional array. We have, on the left side, the operator, what we call the operator, and then, on the right, we have an operand. For instance, we say 'push six'. I want to drive home that this bytecode is physically there and allocated inside our computer's memory.
00:15:55.050 The next thing I want to talk about is how methods work. At a high level, methods are designed to find the name of the method. We need to find the name of the method, identify the type on the left-hand side, and then execute that method. We look up the method from that object and go execute the code. From a low level, we’ll look at the bytecode to understand how this actually happens.
00:16:33.250 We'll take this example here where we have 'foo' and 'bar'. We're going to get the instruction sequence for that. This is Ruby code, specific to MRI, for getting the bytecode for that particular method. When we print that out, it looks a bit difficult to read, but what we want to focus on are these two lines where it says 'four' and 'six'. The 'get local' instruction is our method call.
00:17:01.900 If we have two method calls, let's say we do that twice and look at the bytecode from that; we'll see those two lines repeated twice. We can map this bytecode back to our original code. The original method you can see that this 'get local OP zero' is getting the left-hand side to 'bar' and pushing that onto the stack. The next operation is actually calling the method. We're going to call the method 'bass' on 'bar'.
00:17:40.020 So that's what this 'opt send with block' instruction does. Again, we have our operator and our operand. We have this down here with 'send without block', and we have a third argument that we'll cover in depth later. This is what the actual bytecode looks like for executing a method.
00:18:13.480 Now, when we execute this, we'll say, 'Okay, we're going to call and take the transitions.' The VM is going to execute the method against 'bar'. It will pop 'bar' off the stack and get the return value from that, then push the return value back onto the stack. Now, we're going a little bit deeper here to look at the C code for that. The place you can find it in MRI is in a file called 'insn SDF'. If you ever dump that bytecode and look at those symbols on the left-hand side, each of these maps to a section in this file.
00:19:05.560 In this file, you can search for 'opt send without block', and the code will be right there in the block. This is our opcode, and it’s very simple. We're going to search for the method, and once we've found the method off of that object, we're going to call the method. That's it; it's that simple.
00:19:52.930 Let's talk about how we find the method because we kind of left this out. In order to call the method, we have to be able to find it, and that method is defined somewhere. So, how do we find that method? I'm going to cover the algorithm for doing that. Essentially, we say to the class: 'Tell me what methods you have.' We get the class for the object, asking, 'What methods do you have?' If it doesn’t have those methods, it looks at the superclass.
00:20:28.410 If that one doesn’t have it, it repeats the process. So we keep going up the tree. For example, if we have a structure that looks like this and we're calling 'a.new.foo,' in order to find that 'foo' method, we have this inheritance hierarchy. We check class A for the method; if it's not there, we check class B, then class C, and finally we check class D. Yes, it's there! We found the method, we can call it, hooray!
00:20:58.490 We could say that looking up this method is an O(n) operation, where n is the size of our stack or the number of classes we inherit from. In other words, the more ancestors there are, the slower the method call is. But I'm kind of lying; let’s test this theory. If what I’m saying is true, we should be able to take a class that has ten ancestors and another with 10,000, compare the two, and see if one is slower.
00:21:28.280 So I did exactly that using this code here. What I did was create one class with ten ancestors and another class with 10,000, and it's not actually 10,000; there's a little more in there because of the built-in classes. If I run this code, for those who don't know, this is an IPS benchmark measuring iterations per second. The higher the number, the better it is—the faster it runs because we're measuring iterations per second, so the higher that number is, the better.
00:22:34.310 Here's the output of that code, and the numbers are almost exactly the same. It doesn't actually matter whether there are 10 or 10,000 ancestors, so how is this possible? We looked at the algorithm for finding this method; how is it possible that they're the same speed? So what I want to talk about next is speeding up method calls.
00:23:10.020 We can speed up a method call by caching things that never change. If something never changes, we don’t need to compute it over and over again. If we look at this benchmark, if you check the value for 10, its ancestors never change. The same applies to 10,000; they never change. So, we can do the method lookup once, cache it, and then reuse that cache over and over again.
00:23:56.200 The question arises: I know many of you are probably using a cache in your applications, like an in-memory cache or Redis, but you have to know where this cache lives. These caches actually live inside the bytecode itself. That value right there called 'call cache' is where these method lookups are cached, and it's per call site.
00:24:25.480 Now, since this cache is stored in line with the bytecode, it can be called an inline cache. What I want to discuss next is something that's slightly pertinent to your applications—now I'm nervous this will fall on me. It is improbable that I will die on stage, but not impossible.
00:25:03.350 So, if someone ever says, 'Breaking method cache,' this is what they're talking about. If we look at these call sites, we have our code from earlier; all these call sites contain a 'cache write' that call cache we pointed out in the bytecode. They're all stored there. The one that is now pink is different, and we're going to talk about it.
00:25:39.880 We talked earlier about how when 'object.triple_equals something else', let's take a detour about 'case when' versus 'if statement.' If we look at these two bits of code, on the left should be equivalent to the right. They do the same thing. The interesting part is, if you benchmark both, you might find that the 'case when' version is actually slower than the 'if-else' version, which I find very interesting, although they perform exactly the same.
00:26:32.790 The reason is that this one on the left, where we have that 'triple equals' written out, has a call site that has a cache, while the 'case when' statement doesn’t have a cache. You can see this if you take a look at the bytecode; the bytecode for both looks almost exactly the same. At the top, we’re doing a get inline cache that has nothing to do with this inline cache we’re discussing today.
00:27:07.840 Ignoring that it says 'cache'—just ignore it—this call cache on the left isn’t here on the bottom. This 'check match' is essentially the same as that 'triple equals' call, but one lacks an inline cache. So what I’m trying to say here is: please don’t go changing all your case statements to use 'triple equals'. We should fix this; it's something we need to address.
00:27:53.530 I’m working on a solution to essentially put a cache into the bytecode there, but I think it’s very interesting when we consider that those 'case when' statements don’t actually expand to 'triple equals'. They have their own special bytecode, and that bytecode doesn’t feature a cache. So again, we have all these caches, and I think it’s important to recognize that the size of this cache matters. If we increase the size of the cache, it could increase the size of our entire program.
00:28:43.820 So we need to be mindful of that. So, we've talked about caches, we found the location of the cache, we know what it does, but what do we actually store in that cache? What's actually stored is just a key and a value. The key is a serial number for each class; the value is the actual method itself.
00:29:03.960 When we're talking about breaking method caches, we're talking about changing that serial number. We all know when we get cache misses in production, we might have the wrong key. So how do we get that key to change? First, we need to know how to tell whether or not there’s a cache miss, and we can do that with Ruby's vmstat. If you run this code, you'll get these values out, which are implementation-specific.
00:30:01.350 What you want to look for is changes in these numbers. If the top number changes—the global method state—that’s very bad. If that number changes, it will impact every cache throughout your code. Now, we talked about how those caches are in all those different places, and when the cache key changes, it only impacts call sites associated with that class.
00:30:38.670 However, the global state impacts all those caches. So if you change that, it’s extremely bad. If that number changes, it's not great, but it’s wise to avoid changing it if possible. In the course of assembling this presentation, I wanted to show examples of these numbers changing, so I decided to throw it in an RSpec test.
00:31:12.160 If you print out vmstat in all of these, you’ll see the output shows that every test changes the class serial number. So you’ll see that increase across each of the iterations. Each of these 'it' statements ran, and I filed a bug about it. We are working to figure out a way to fix the issue, so don’t worry about switching off RSpec.
00:31:50.930 These numbers are increasing; it's terrible! It’s fine; don’t stress! Anyway, to break this cache, you need to define new modules or classes. Whenever you define a new class, you'll see that the class number changes. Whenever you reopen a class, you’ll see that serial number change as well.
00:32:28.520 So if we're doing that at runtime, it's bad, but if we do it at boot time, like here, it’s acceptable. We're doing that once, and when we’re done with that, it’ll be stable throughout the lifetime of your process. That's what you want; you need those numbers to stay stable.
00:33:01.380 It's fine if they change, as long as they're stable. So any of these misses at boot time are amortized across the life of your process, so it won’t matter at boot. Now, let's look at some runtime breakage. I wrote a tiny method up here at the top, so you can see. If we extend an instance or call 'singleton class,' those will break your method caches.
00:33:41.020 Those two at the bottom are what RSpec is doing. The way these are breaking the cache is by accessing the object’s singleton class. If you have an instance of foo, the object type isn't just foo because you might have monkey-patched it; if you do, those monkey patches will go onto the singleton class.
00:34:15.950 We can make a mapping to say, okay, let's take a look at what 'a' is class is and its so-called real class versus 'b's class; I made a little diagram. You can see 'b's real class is just 'foo' since we're not monkey patching it or doing anything while we are doing an instance exec on 'a'.
00:34:59.590 We're adding a method to it and that method needs to go on the singleton class of that instance. So, that class’s actual class is the singleton class. When you say 'class of some instance', we get a singleton in the top case; we get a normal class in the bottom.
00:35:52.930 So cache misses usually occur either at boot or when new classes or modules are introduced at boot time or when we're accessing singleton objects. As I said, as long as this number doesn't change while you’re running your application, you should be fine.
00:36:31.580 We’ve done tests in Rails to ensure that this number doesn’t increase as requests flow through the application. However, if you ever see this happening in Rails, it shouldn't be—let us know. Next, let’s talk about cache misses, and the earlier points are questionable.
00:37:14.790 Normally, defining those classes and modules is fine, but as soon as we start accessing singleton classes, it gets questionable. When I say 'instances exec,' I expect your eyebrows to rise. If we’re doing an include or extend on an instance, you might raise an eyebrow.
00:37:58.060 I want to discuss cache misses while doing something completely normal in the case of polymorphism. Let’s say we have an instance of 'foo,' and we've got a predicate method called 'interesting.' Every time you ask if 'foo' is interesting, it has to perform that comparison. One technique we can use is: if we need to call 'interesting' many times, calculate that upon allocation or creation of the object and allocate a subclass that we know is interesting.
00:38:46.890 We can just return 'foo' every time, for example. This shows how we can use polymorphism to remove runtime conditionals from our code. Another example might be behavior, configuring an object with particular behaviors, doing it depending on the behavior required. It’s very rare for me to see this type of intentional polymorphism.
00:39:33.230 So, it’s time for me to STFU. Remember, it’s only a failure if you learn nothing. Even though we have failed to cover how the polymorphic inline cache isn’t going to help our application, at least I learned along the way. I hope you can take away the things I learned. Another interesting point: the polymorphic inline cache is load-specific.
00:40:22.330 In essence, it wouldn't help out our application, but it may benefit yours. I don’t know that our code is representative of code at large; it could be that our code is a total outlier. You could be using this crucial polymorphic stuff, and it could help you immensely.
00:41:07.830 So, what I would ask is if you have time, go check out my fork of Ruby on GitHub. Check out these two branches; pick one that has the inline cache so you can test your code. The other one is the logging one. If you can combine the two, then you can see both at once. Remember only to optimize code that matters. I'm not going to push for this code upstream because it doesn’t really matter.
00:41:41.580 Please take measurements! If you’re doing optimizations without measurements, how do you know that the optimization worked? You’d be surprised how many people do this without measuring. Finally, please use more polymorphism—make my work worthwhile.
00:42:05.850 Thank you very much!
Explore all talks recorded at BathRuby 2016
+12