RubyKaigi Takeout 2021

Keynote: The Future Shape of Ruby Objects

TruffleRuby uses an optimisation called object shapes to optimise Ruby. It automatically learns and understands the layout and types, or the shape, of your objects as your code is running and optimises code to work better with those shapes. As the community tries to make MRI faster, it could be time to adopt object shapes there as well. We’ll talk about what TruffleRuby does, how it does it, and the benefits it achieves in practice.

RubyKaigi Takeout 2021: https://rubykaigi.org/2021-takeout/presentations/chrisgseaton.html

RubyKaigi Takeout 2021

00:00:00.320 Hello, I'm Chris Seaton, a Senior Staff Engineer at Shopify, where I'm working on Ruby performance.
00:00:06.799 I'm the founder of TruffleRuby, which is a highly optimized implementation of Ruby. I also maintain the Ruby bibliography at rubybib.org, which documents research on Ruby.
00:00:13.840 Today, I want to share some existing ideas that TruffleRuby uses for implementing objects in Ruby, which we believe could also be applied in MRI or other Ruby implementations.
00:00:25.359 We'll explain where these ideas originated and their history. We will describe how they work in a high-level way, showcase how TruffleRuby implements them, and discuss what MRI could achieve through adopting these techniques.
00:00:36.719 Furthermore, we’ll look into the future and explore what else could be built on top of these ideas.
00:00:43.520 When people ask me why Ruby is sometimes slower than other languages, the main reason I usually cite is the 'what ifs.' Ruby implementations spend a significant amount of time asking themselves 'what if.'
00:00:50.079 For example, for every addition operation, they have to consider, 'what if it overflows?' For every method call, they ask, 'what if the method has been monkey-patched?' For every line of code, they ponder, 'what if someone has enabled tracing?' This leads to a lot of overhead.
00:01:14.240 Currently, there's considerable effort to reduce the cost associated with these 'what-ifs.' Tools like MJIT, JRuby, Sorbet, Compiler, and YJIT along with TruffleRuby, all target this issue.
00:01:28.400 However, apart from JRuby and TruffleRuby, most implementations use Ruby objects in a similar manner without effectively addressing the 'what-ifs' associated with Ruby objects.
00:01:45.520 We constantly need to consider scenarios such as: what if the object is frozen? What if the object lacks the expected instance variables? What if the object is shared between multiple threads?
00:02:04.159 The concept of object shapes, which we will discuss today, is a technique designed to eliminate as many of these 'what-ifs' as possible. Many believe that taking this next step will enhance other Ruby performance improvement efforts, such as MJIT, the Sorbet compiler, and YJIT.
00:02:29.520 This approach may also have implications for the memory space required for your Ruby programs, indicating that it's not solely about time performance.
00:02:35.599 When we discuss Ruby objects, we are specifically referring to objects that contain instance variables. The way these instance variables are stored inside the object is crucial.
00:02:47.760 There are many other interesting aspects of Ruby objects, and we will touch upon some of these topics towards the end of the talk. However, for the main focus of this discussion, we will concentrate on instance variables.
00:03:03.680 For our purposes, a Ruby object can be thought of as a bag of key-value pairs. Unlike a Ruby hash, all the keys are symbols, and the key-value pairs are unordered. We can get and set these instance variables using the '@' syntax, but we can also use methods like instance_variable_get and instance_variable_set, along with keywords like defined?, which checks if a key is set. A debugger may also provide similar tools for getting and setting instance variables.
00:03:38.159 Ideally, we would like to consider all of these syntaxes, methods, and keywords as equally important and equally valuable to optimize. This way, developers can use whichever method they feel is most appropriate for solving their specific problem without worrying about which approach is faster.
00:04:01.360 It is critical to emphasize that in Ruby, instance variables conceptually reside within the object itself, as they are not part of the object's class or metaclass. Instance variables are used directly on the object, even when using an attribute accessor. The class doesn't have explicit knowledge of them.
00:04:21.440 Let's examine how objects and instance variables function today across various Ruby implementations. Please note that the code I will present during this talk is significantly simplified for clarity and omits many corner cases and optimizations that are not pertinent to this discussion, such as class instance variables, embedded instance variables, and generic instance variables.
00:04:38.880 MRI, or C Ruby, is the standard implementation of Ruby, written in C. Let's begin by looking at MRI's data structure.
00:04:51.840 For each object, MRI maintains a reference to the class, a count of instance variables in the object, and a reference to an array containing the values of the instance variables.
00:05:04.320 In the class, MRI uses a hash to map instance variable names (as symbols) to their index in the corresponding object. This index is stored in the iv_ptr.
00:05:30.960 Next, let's examine the code MRI uses to access these data structures. I will initially focus solely on the fast path or the happy path that is typically executed when everything runs smoothly, which is how idiomatic Ruby code is most often encountered.
00:05:51.840 There is also a slow path for handling other situations, but we will not delve into that here. The code checks whether the class serial number matches the expected class serial recorded during the last access.
00:06:07.680 The class serial is a version number that increments whenever the class is modified. We must also ensure that the object has a sufficient number of instance variable slots, as it may not align with the size of other class instances.
00:06:18.160 To better understand this, we can view it in terms of pseudo-code. To retrieve an instance variable, we have a cache passed by the interpreter and the receiving object, which contains the instance variable.
00:06:36.400 We compare the cache's serial number against the object's actual serial number, and we check that the expected number of instance variables is less than or equal to the actual number. If everything checks out, we can proceed to read the instance variable.
00:06:58.320 If not, we revert to the slow path. The process for setting the instance variable is quite similar, but we write into the iv_ptr instead of reading from it, and again, there's a slow path for writing.
00:07:18.160 JRuby is a Ruby implementation written in Java. Unlike MRI’s various JITs, JRuby re-imagines how objects work from the ground up. It starts by making a static guess about what instance variables it expects a class to have.
00:07:39.520 This guessing process involves inspecting all methods, including inherited ones, for any variables referenced with the '@' syntax. Subsequently, JRuby generates a new Java class at runtime that includes a field for each Ruby instance variable.
00:07:54.320 These dynamically created Java classes are shared among all Ruby classes that exhibit the same number of variables.
00:08:05.760 JRuby also maintains a map that links instance variable names to their respective Java field equivalents. This mechanism allows efficient access to these variables.
00:08:25.680 When accessing these variables, JRuby utilizes a feature of the JVM called dynamic invocation. This technique is quite sophisticated, and I won’t go into detail here. Instead, I will illustrate the effective result using Ruby pseudo-code.
00:08:52.240 For JRuby's version of 'get ivar,' we verify if the cache's ID (analogous to a class serial) matches that of the object's actual class. Again, we access the object through indexing, and just as before, there exists a slow path for other scenarios.
00:09:07.520 One significant advantage of this approach is that since the number of instance variables is allocated statically, we only need to check to ensure that the object's class is as expected. We save ourselves from the additional check that MRI performs to confirm that the object has sufficient storage for its instance variables.
00:09:29.440 Nonetheless, there is a notable limitation: this method relies on static guesses, meaning that if JRuby misjudges the setup of variables or classes due to some form of metaprogramming—common in idiomatic Ruby—it defaults to a slower path for reading and writing instance variables.
00:09:45.760 Historically, Rubinius employed a similar approach. There are arguments supporting and opposing this trade-off made by JRuby, but we believe we can achieve the best of both worlds with object shapes, eliminating concerns over performance loss.
00:10:03.040 Now, let's delve into the historical background of object shapes. The concept of shapes finds its origins in the Smalltalk and Self programming languages. You may often hear discussions about these languages concerning language design and history.
00:10:23.360 Practitioners are keen to highlight that their languages draw inspiration from both Self and Smalltalk, just as they do with Haskell. Object shapes are a great example of a technique attributed to researchers who worked on these languages many years ago.
00:10:41.760 Typically, discussions about historical contexts are limited in our field, so it's a pleasure to examine original papers that outline this concept. The initial foundation for these ideas started with Smalltalk, which today is viewed as a simple and elegant language.
00:11:01.520 Interestingly, during its early days, some critics labeled Smalltalk as overly complex. In response, a simpler language named Self was created to enhance developer expressiveness and productivity—what we might term as developer happiness today.
00:11:21.440 Self introduced prototypes rather than classes—a more straightforward mechanism similar to how Javascript operates today. In JavaScript, objects contain properties, and inheritance occurs by pointing to another object that acts as a fallback for property access.
00:11:41.760 Ruby has classes, but as we noted earlier, the methods and instance variables of individual instances are not strictly determined by their logical class. In this way, Ruby exhibits a degree of similarity to prototypes.
00:12:14.080 However, an efficiency deficit marked Self as a cost of prioritizing developer happiness. This prioritization closely mirrors Ruby's philosophy, where developer happiness can sometimes yield performance trade-offs.
00:12:38.080 Fortunately, this focus prompted researchers to devise methods to optimize Self despite these inherent costs tied to their design.
00:12:57.200 Such research has led to many significant developments still influential today, including polymorphic inline caching. I won't detail this technique now, but it plays a crucial role in the high performance witnessed in Java and JavaScript.
00:13:16.960 TruffleRuby implements an innovative version of polymorphic inline caching known as dispatch chains, which optimizes Ruby's metaprogramming capabilities.
00:13:25.840 The idea of shapes, or maps as termed in the Self language, and sometimes hidden classes in JavaScript, is our primary focus.
00:13:30.400 We will use the term shapes here to avoid potential confusion with hashes, especially for those with a Python background. The original concept involved separating the keys and values of instance variables, where the keys would reside in the map, indicating the values' indexes within the object.
00:14:02.480 Notably, the initial motivation for this idea revolved around saving space rather than improving speed. The realization that prototypes necessitated all objects to have a complete set of keys led researchers to understand that by separating them out, they no longer needed to store copies since objects with identical keys could share them.
00:14:23.520 This is how the original research described the concept, but we aim to provide a comprehensive explanation using terminology and diagrams that resonate more closely with Ruby. The shapes we are discussing will map instance variable names to their respective indexes for all objects possessing that shape.
00:14:49.760 We will represent shapes with dashed lines while objects will be depicted with solid lines in our diagrams. Thus, the object exists on the left, while the shape appears on the right.
00:15:03.120 Each object references a class (not essential for this discussion) and its shape, creating an arrow from the object to the shape. Together, the class and shape pointers form the header of the object.
00:15:18.880 The object merely contains the values, while the shape includes a mapping of names to indexes, meaning it does not store the actual instance variable values, only their pointers.
00:15:46.480 Every object must have a shape; objects initially created with a designated empty shape, for example, have their required space allocated according to the respective shape. This may necessitate resizing the object or utilizing a separate data structure to accommodate the size needed.
00:16:03.200 The manner in which space within the object gets utilized is dictated by the shape. One can conceptualize the shape as metadata describing the data itself or as a schema in a database context.
00:16:19.760 To access an instance variable, you would logically look up the index in the shape and then read the corresponding slot from the object. In practice, this translates into fetching the index from the shape and using it to retrieve the value from the object.
00:16:47.600 If, however, you aim to set an instance variable that has previously been defined, the process is analogous. You would find the index through the shape, identify the instance variable by name to find its index within the object, then write the value at that position in the object.
00:17:06.560 This method is limited to instance variables that have been set previously in the object. We will discuss the process for handling new instance variables shortly.
00:17:29.760 Now, how does this differ from how MRI operates? While the general approach may seem similar, the key distinctions arise from the immutability of shapes.
00:17:45.920 Immutability is often advantageous when designing software because if you know what the shape is at any point in the past, you understand everything about that shape going forward. This enables hardcoded assumptions with confidence.
00:18:05.200 Secondly, shapes are distinct from classes, which means instances of the same class can possess different shapes. Hence, objects will not change unexpectedly if other instances modify their internal structure.
00:18:25.679 Due to their immutability, you can compare the shape of an object against an expected shape with minimal overhead, likely reducing it to a single word comparison for your processor. This stands in stark contrast to how instance variable changes affect classes in MRI.
00:18:47.680 Previously, while setting an instance variable, we suggested that it had already been assigned, thereby receiving a new value. However, if you attempt to set a previously unassigned instance variable, the process diverges.
00:19:01.320 Transitioning shapes involves recognizing which shape you would acquire upon adding, removing, or modifying an instance variable within the shape. In this example, if you integrate a height variable, the shape will adapt to represent that change.
00:19:18.960 When adding the height instance variable, we do not alter the pre-existing shape; rather, we generate a new shape reflecting this change. This also establishes a graph detailing the transitions taken as your program executes.
00:19:36.080 It’s crucial to highlight that transitioning between shapes is not a slow operation. This process can occur effectively, leveraging information already present within the shape.
00:19:53.840 In practice, obtaining a new shape entails reviewing the object's existing shape, analyzing transitions, and determining the updated shape based on the addition of an instance variable. This new shape may already exist, and if it does not, it can be generated easily.
00:20:16.000 After identifying the index within the new shape, we modify the object to reflect that change. It may be necessary to resize the object, but we can typically amortize that adjustment rather than resizing incrementally.
00:20:34.080 The critical point here is that these transitions do not represent a performance crisis; they can occur smoothly throughout your program's execution.
00:20:56.560 The effects of this approach are magnified when combined with compilation to machine code, whether that’s just-in-time or ahead-of-time compilation. Compiled machine code can incorporate hardcoded values, thus enhancing efficiency.
00:21:13.680 By hardcoding the address of the object shape and leveraging that information for quick access, we can facilitate rapid comparisons and reads. In cases where the comparison fails, we redirect to a slower code path—an occurrence that we anticipate infrequently.
00:21:34.240 These two operations—the comparison and the read—can often proceed simultaneously on modern processors. Processors can predict whether a branch will be taken or not, which allows us to optimize the flow.
00:21:51.680 When we outline this behavior in pseudo-code, we can observe that unless the object shape aligns with our expectations, we move to a slow path.
00:22:10.560 We can first read the shape, compare it against a pre-defined number, and subsequently execute the read operation based on that match.
00:22:28.320 The essential aspect is that constant numbers can be integrated into the instruction stream rather than being pulled from external memory, improving performance significantly.
00:22:48.160 Now, how does TruffleRuby implement shapes? To illustrate this transition from abstract discussions to practical applications, we will look at internals drawn from the research paper titled 'An Object Storage Model for the Truffle Language Implementation Framework' written by Andreas Vos and colleagues.
00:23:10.880 In demonstrating TruffleRuby's implementation of shapes, we will refer to an internal graph data structure called the Intermediate Representation, developed by Shopify to visualize Ruby programs.
00:23:30.560 I acknowledge that these graphs may be a new concept for some viewers and could initially seem daunting, as they originate from a production compiler and are not designed for easy interpretation.
00:23:50.720 However, it's beneficial to showcase these using actual compiler production data structures. This fragment of a compiler graph represents the operations related to object manipulation.
00:24:07.760 Here, nodes signify computations or values, with thick red lines illustrating control flow. If a thick red arrow points from one box to another, it implies a sequential execution order.
00:24:23.680 The thin green lines represent data flow, showcasing how information exits nodes and travels to others. For example, node 524 labeled 'equals' takes in two inputs and produces one output.
00:24:42.240 This flowchart version of Ruby code reveals implicit elements made explicit, aiding our understanding of compiler operations.
00:25:02.000 In this particular instance, we observe an instance variable read initiated by loading the shape. Hence, node 521 loads the object's shape, followed by comparing it against a known shape.
00:25:24.880 If these shapes match, the process continues; otherwise, a jump occurs to a slower path. The final component of this operation entails loading a known value from within the object's slots.
00:25:41.920 Overall, this results in three distinct machine instructions. For reference, on Intel processors, one instruction can load data from memory while performing a computation.
00:25:54.800 In this case, we compare the object and read the shape from within it, followed by comparing it with a predetermined shape, which can be hardcoded into machine code.
00:26:06.720 Next, we access the indexed value from the object and proceed with the following operations. Setting an instance variable that already exists in an object mirrors this process.
00:26:26.080 The instruction flow mirrors what we designed for reading, except the final two operations reverse their roles. Here, we write rather than read, using a 'store field' instruction instead.
00:26:47.040 Once again, this procedure encompasses merely three machine instructions, resulting in efficient reads and writes for instance variables. It's difficult to conceive a method of accomplishment that is more efficient.
00:27:12.240 Let's explore the case where the instance variable doesn't exist yet. This instance will indeed change the shape, necessitating a transition from the current shape to a new one.
00:27:33.520 Because every aspect of the shape is hardcoded, the transition will also be constant. Writing the new shape alongside the value is the straightforward next step.
00:27:54.560 In situations where Ruby code attempts to set a new instance variable—one not previously recorded within the object—the procedure becomes simpler.
00:28:12.400 The only distinction comes from writing the new shape, necessitating a field adjustment based on its value status—acting more like the second instance variable.
00:28:33.760 Importantly, this operation does not become a slow path procedure; it can occur seamlessly along with other program actions.
00:28:53.600 Moving on, advanced strategies can be applied to shapes in Ruby beyond mapping instance variable names to their storage locations in objects; we can also map their types.
00:29:15.200 Currently, in MRI, all instance variables store full value objects, meaning they serve as pointers. This method necessitates tagging to store them as values.
00:29:38.320 Tagging must also reverse when retrieving these values. Conversely, if we affirm that an instance variable will always be an integer through our shape, we can eschew the entire tagging process.
00:29:58.560 Whenever the variable transitions into a different type, we can adapt the shape to reflect this change, similar to adding a new instance variable.
00:30:14.400 For instance, if we have a routine in an object characterized by two instance variables always being small integers, the generated machine code will read values without requiring the additional effort of untagging.
00:30:32.320 The assumption made is that these readings are untagged values, allowing for straightforward mathematical operations initiated from both registers.
00:30:47.680 The code then achieves the efficiency akin to that typically expected from compiled languages, resembling operations output by a C compiler rather than those from a Ruby compiler.
00:31:14.080 Considering further optimization avenues, flattening references such as class and frozen state properties in the shape can yield notable advantages, including storage space reduction.
00:31:31.600 Establishing this representation allows collapsing three separate checks—the class checks, frozen status checks, and instance variable write checks—into a single shape check.
00:31:51.920 Currently, the JRuby implementation incorporates checks for frozen status when writing instance variables. However, as we observed, this overhead gets eliminated through shape integration.
00:32:08.600 This outcome illustrates the capability of achieving language feature advantages without the performance costs typically associated with them.
00:32:26.560 Looking at the differences in how JRuby folds class information into the shape, the concern is that conflating class and instance variable checks may introduce unnecessary complexity.
00:32:42.880 Additionally, JRuby can mark objects shared between threads within the shape and activate related synchronization locks, expanding potential applications for shape structures.
00:32:57.840 The optimization efforts for Ruby execution models currently target the introduction of just-in-time compiled machine code through MJIT and YJIT, representing a notable shift.
00:33:15.200 However, these strategies neglect optimization at the object structure level. I believe the exploration of shapes in Ruby can open new avenues of efficiency.
00:33:30.320 Significant innovations undertaken at Shopify—including concurrent compacting for the garbage collector, variable-width allocation for objects, and new garbage collection methods—represent exciting opportunities.
00:33:53.440 TruffleRuby demonstrates some impressive optimizations for hashes and arrays, allowing specialization based on their size and contents.
00:34:08.560 With the prevalence of objects—especially hashes and arrays—in idiomatic Ruby, the aim should be to enhance the efficiency of their usage.
00:34:28.480 In conclusion, I strongly advocate for implementing object shapes in MRI, as I believe the complexity involved is relatively low and can yield immediate performance benefits in the current MRI interpretation.
00:34:47.840 Implementing this strategy can enhance the performance of existing efforts like MJIT, YJIT, and Sorbet by providing optimizations that significantly leverage the nature of shapes.
00:35:04.880 TruffleRuby illustrates the practicality of the idea, and I am excited about the possibilities of advancing this concept further than we have examined today.
00:35:24.640 The paper I referenced earlier discusses practical implementation details, addressing performance metrics in comparison to other Ruby object methods.
00:35:40.320 I do not believe there are any substantial problems to resolve; it seems more like engineering challenges remain.
00:35:51.920 This is a well-trodden research path that dates back to the early 1980s, showing proven strategies for application in Ruby.
00:36:06.240 I urge us all to keep being aligned with how programming languages like Self have approached similar challenges in balancing developer happiness with performance efficiency.
00:36:28.960 Thank you very much for your time and for listening to my proposal on object shapes in MRI.