Building a Better OpenStruct

Ruby

Ariel Caplan

@amcaplan

#ruby

#performance-optimization

Building a Better OpenStruct

by Ariel Caplan

The video "Building a Better OpenStruct" presented by Ariel Caplan at RubyConf 2016 explores the enhancements made to Ruby's OpenStruct, a standard library tool known for its dynamic data handling capabilities but criticized for its subpar performance. Caplan starts by introducing OpenStruct as reminiscent of JavaScript objects and presents its functionalities, including dynamic property creation and access through dot and bracket notation.

Key points discussed include:

Use Cases for OpenStruct: Caplan identifies three prominent uses:
- Consuming APIs: Using OpenStruct to simplify API responses by converting JSON data into Ruby objects, facilitating easier data manipulation through meaningful object definitions like 'Tweet' for Twitter API responses.
- Configuration Objects: Leveraging OpenStruct to create configuration objects with a DSL-like syntax, allowing users to set and retrieve arbitrary keys conveniently.
- Test Doubles: Employing OpenStruct to create mock objects that simulate complex dependencies, such as payment gateways, for testing purposes without the associated costs.
Performance Issues: Caplan highlights several shortcomings of OpenStruct, particularly its performance being 10 to 40 times slower than typical classes due to its reliance on Ruby's method_missing and method definitions.
Improvements Introduced: Caplan presents several alternatives to improve upon OpenStruct's inefficiencies:
- OpenFastStruct: A faster variant that avoids method definitions for every attribute at the cost of breaking compatibility with nil behavior when accessing nonexistent keys.
- PersistentOpenStruct: A middle ground that creates methods upon setting attributes but maintains better performance than standard OpenStruct.
- DynamicClass: A significant innovation that dynamically defines instance methods based on provided attributes, further enhancing speed and usability over previous solutions.

Caplan emphasizes the importance of continuous experimentation and benchmarking for developers, encouraging creativity to drive improvements in the Ruby ecosystem. He concludes with an empowering message, underscoring that anyone can contribute to the community through innovative problem-solving and collaborative growth.

Overall, Caplan's talk underscores the balance between usability and performance enhancements in developing a better OpenStruct for Ruby, promoting a culture of experimentation and community contribution.

00:00:15.550 Alright, this talk is called 'Building a Better OpenStruct,' and that's what we're going to talk about. I do have to say it's a little bit scary to be up on the stage as kind of the sequel to Matz.

00:00:20.990 I want to also introduce something big to the Ruby community. I call it 'Matawan Schwaben.' It's short for 'Matz's talk was nice, so this talk will also be nice.' I will do my best to deliver a nice talk as well.

00:00:36.280 Now, let's talk about OpenStruct. Before we get into what it means to build a better OpenStruct, we need to understand what OpenStruct is, how it works, what the problems with it are, and why we would want to build a better one.

00:00:42.290 I like to think of OpenStruct as Ruby's JavaScript object. It's funny because if you look at the initial commit on the Ruby repository, it goes back to 1998, so not quite at the beginning of Ruby, but OpenStruct was already there. It was initially described as a Python object, but now I prefer to think of it as a JavaScript object.

00:01:04.910 You can require OpenStruct anywhere in your code since it's part of Ruby's standard library, so it's always available. When you initialize an OpenStruct, you can do it in a couple of ways: either with 'OpenStruct.new' or, more commonly, by initializing it with a hash.

00:01:16.340 For instance, you may initialize OpenStruct with a hash that has a key of `:foo` and a value of `:bar`. Both keys are generally required to be either strings or symbols. Once it's initialized, it's not too late to add properties, since you can use either dot notation like `obj.baz = 4` or bracket notation to set properties.

00:01:35.660 For example, you can set a key of the symbol `:something` to the string 'whatever.' Once all your information is in your OpenStruct, you can start accessing those properties using dot notation or bracket notation. It doesn't matter when you put the information in; it will be available regardless.

00:01:48.709 If you try to access a key that doesn't exist, it will simply return nil. As you can see, this makes OpenStruct behave similarly to a JavaScript object, allowing you to use dot notation and bracket notation interchangeably.

00:02:05.330 This flexibility is nice, but why would we use OpenStruct? Eric Michaels Ober gave a great talk at Rails Israel last year where he identified three common use cases for OpenStruct. While the code examples I'll show you today are my own, the cases themselves come from him, and I want to give him credit for that.

00:02:30.530 The first and most significant use case I'll be focusing on is consuming an API. The straightforward approach for this is making an API call, retrieving some JSON, parsing that JSON into a Ruby hash, and then passing that hash into an OpenStruct.

00:03:00.380 A more complex pattern, which we often use in our applications, involves subclassing OpenStruct and then creating a new instance of your subclass with that hash. This is advantageous because it allows you to assign a meaningful name to the object you're creating. For instance, if you hit the Twitter API, you get back a hash, and you can feed that into your OpenStruct subclass, creating a 'Tweet' object, which simplifies debugging and interaction with your data.

00:03:38.750 If the API response looks like a typical hash with a couple of keys and values, using OpenStruct makes it easier to retrieve information using dot notation. While this may seem like a subtle distinction, it encourages you to think of your information not just as raw data but as an object. In more advanced use cases, you can even define methods that interact with the data coming from an API.

00:04:02.120 So that's the first common use case: consuming an API. The second common use case is for creating configuration objects. Let's say you've created a gem that requires some configuration. You might want to design it to allow a DSL-like syntax. You’d create a method like 'configure,' which would yield to a configuration block that takes a configuration argument, enabling users to set various settings.

00:04:49.460 This is straightforward since your configuration object can also be an OpenStruct. It has the added advantage of allowing users to set and get random keys without additional hassle. The third use case is a little more complex and could go over your head, so don't worry too much if it does. It involves using OpenStruct as a test double. Suppose you have an order class that requires a payment gateway to process product payments. Feeding it an actual payment gateway for testing can be expensive, particularly if it involves real credit card transactions.

00:05:55.400 Instead, you can use an OpenStruct-like object that mimics the payment gateway without actually charging a credit card. Each key-value pair added to your OpenStruct becomes a callable method, allowing you to create a mock payment gateway that returns preset values when interacted with. Hence, OpenStruct can serve well as a simple, easy-to-use test double.

00:06:46.310 Now that we've established why you might want to use OpenStruct, let's take a look at how it works under the hood. I should mention that the code I'll show you has been edited heavily. This isn’t to mislead you but rather to focus solely on the key aspects that illustrate how OpenStruct operates.

00:07:08.360 OpenStruct defines attribute setter and getter methods on the object's singleton class. Not everyone is familiar with singleton classes, so let me provide a quick refresher. In Ruby, if you have an object, you can define a method just for that particular instance, creating what is known as a singleton method. It's crucial to understand that methods live in classes, not in objects. Thus, a singleton method exists specifically for one single Ruby object.

00:07:51.090 When you create a new OpenStruct, you have to initialize an instance variable, which is set to a hash. Assuming you've passed in a hash as an argument, which typically contains key-value pairs, it coerces the keys to symbols and stores them inside its internal hash table. When you attempt to set a new property using dot notation, Ruby treats this as syntactic sugar for invoking the appropriate setter method and calls method missing. This will check if the designated method already exists; if it doesn't, it will generate it and set the value in the hash accordingly.

00:08:56.030 As you might have guessed, OpenStruct's performance isn't stellar. Depending on your use case, it can be anywhere from 10 to 40 times slower than an explicitly defined class. The main reason for this slow performance boils down to the overhead of defining methods in Ruby. Also, relying heavily on method missing adds to the performance issues, although it's not as slow as defining methods. There’s a common misconception that OpenStruct invalidates the global method cache. Luckily, this issue has been resolved in Ruby 2.1 and later, which makes OpenStruct far less of a bottleneck than it used to be.

00:10:00.350 What led the change is actually a fascinating story about how method lookup in Ruby works. When you define a method on an object, Ruby marks that method for future lookups, but if the class is reopened and changes are made, all that lookup caching may need to be discarded. This was problematic for OpenStruct since the overhead of recreating this method lookup can be high, and it created performance issues in requests.

00:10:29.200 Through profiling, a prominent community member, James Golick, discovered that a significant percentage of application time was spent rebuilding that global method cache. This led him to develop a hierarchical method cache in Ruby 2.1, which optimized how method sharing works across classes and instances, meaning you could now create OpenStruct objects without impacting overall method cache performance.

00:11:15.600 The takeaway from this is that you can utilize OpenStruct without adversely affecting your application’s performance anymore. However, a persistent problem remains: OpenStruct itself is inherently slow. Therefore, let me introduce the ideas I'm going to talk about today to see whether we can build a better OpenStruct. I will share four stories with you, including one story about how James Golick's work relates to OpenStruct improvements.

00:11:47.750 The first improvement comes in the form of OpenFastStruct, which is an alternative to the existing OpenStruct implementation. I remember opening my Ruby weekly email and seeing an article about OpenFastStruct, which promised faster performance than OpenStruct. This implementation claimed to be faster than OpenStruct but still slower than a regular hash.

00:12:03.330 The OpenFastStruct initializes just like OpenStruct, but it bypasses method definitions that slow things down. Instead of defining new methods for every attribute, it relies on dealing with method missing when trying to retrieve values. This leads to better performance when attributes are accessed infrequently, as it’s cheaper to handle method missing than to define many methods upfront.

00:12:53.430 However, while there are performance improvements, OpenFastStruct breaks compatibility with applications that expect nil results when keys do not exist. Instead, it will return a new OpenFastStruct object that allows for infinite chaining, which can lead to unexpected behavior in applications using it. This change means the API broke, making the faster alternative unsuitable for some use cases.

00:13:57.670 Next, I brainstormed how to improve upon OpenStruct further, potentially addressing its limitations while maintaining its usability. That’s how I arrived at the idea of PersistentOpenStruct, which is specifically tailored for cases like API consumption. Unlike regular OpenStruct, where attributes are set up during initialization, PersistentOpenStruct ensures that methods are defined as needed but allows for faster attribute access overall.

00:14:45.430 The approach is simple; rather than calling method missing, we define methods when setting attributes. It works well for setups where you will repeatedly create objects of the same shape. While it's still slower than a regular class, it outperforms traditional OpenStruct by avoiding the repetitive overhead of method definitions every time an object is created.

00:15:54.420 The next idea that I want to discuss, which came shortly after, was a DynamicClass, which is not reliant on the old structural way of thinking. When you create a DynamicClass with a block, it produces a new class where you can define the attributes and methods contained within it, allowing you to rethink how OpenStruct can work without the sluggish performance.

00:16:32.560 In comparison to PersistentOpenStruct, which manages methods through internal hashes, DynamicClass defines instance methods right away and provides object-level access through simple getters and setters. This results in a more suitable option for dynamic attributes and reduces overhead significantly, making it faster than even PersistentOpenStruct.

00:18:09.350 In the end, we find ourselves with a far cleaner implementation of a dynamic object that still retains much of the usability that OpenStruct offered. The performance benchmarks indicate a great improvement, validating my initial suspicions about the approach. As a final note, remember to always benchmark your applications and test different implementations to find the best fit for your particular use case.

00:19:24.670 Engaging in continuous experiments ensures that we not only learn but also innovate our crucial development practices. I want to emphasize that everyone has the potential to make a significant impact. You do not need to be the most brilliant programmer to invent solutions—embrace your unique perspectives and experiences because they can benefit the entire Ruby community.

00:20:17.020 Thank you for your time. I'm looking forward to what you're all going to build and improve in the Ruby ecosystem. Let's strive for continued growth and collaboration together to enhance our community and make it richer.

RubyConf 2016