00:00:15.650
Good morning, everyone! How are we doing this morning? Did you have a good night last night? Did everyone enjoy the party? Are we all awake? Do we need to do some stretching or anything? Sirhan's talk was really good, so I think we're probably all awake.
00:00:21.330
So, how about this venue? We’re in the Crystal Ballroom, which for those of you on the live stream looks like this. It makes me a little nervous that Statler and Waldorf might pop up on one of those balconies and start heckling me.
00:00:34.680
Given where we are, I want to be crystal clear about what I’m talking about. No need to be fuzzy. Anyway, now that my terrible puns are out of the way, let’s get started.
00:00:49.680
I’m going to talk about subclassing Hash: what’s the worst that could happen? As Megan said, my name is Michael Herold. If you have any questions or anything, please tweet me at @mHerold or say hello at Michael J Herold. As Megan also said, I work at Flywheel, a delightful WordPress hosting company for designers and creatives.
00:01:09.119
Now, I said WordPress, but we do all Ruby, so don’t worry, I’m not an impostor. Also, we're looking for an engineering manager, so if you are one and looking for a new job, please come talk to me or one of my co-workers here.
00:01:38.970
This talk is about a little gem called Hashie. If you read Hashie's GitHub page, it says that Hashie is a collection of classes and mix-ins that make hashes more powerful. Let’s think about this for a minute. What pops out at you from that sentence? The phrase 'more powerful' immediately stands out to me.
00:01:58.860
Whenever I hear that phrase, it makes me think of Uncle Ben. Of course, it might also make you think of 'unlimited power,' which we know we may also get by doing this. But let’s get back to Uncle Ben. Sadly, Stan Lee passed away this week, which is a sad day. But Uncle Ben is famous for saying, 'With great power comes great responsibility.'
00:02:22.410
I’d like to juxtapose this with an Alexander Pope quote: 'To err is human.' Well, humans write computer programs, and what do computer programs have? Bugs. This talk is primarily centered around three different bugs in three portions of the Hashie library. So that’s the framework for our story.
00:02:49.980
The first bug we’re going to talk about occurs in a Hash extension that we call Indifferent Access. If you're a Rails developer, you know that there’s a hash with indifferent access in Active Support. We have an extension that provides you that power without having to use Active Support, but there’s a bug in there.
00:03:07.550
There’s also a bug in Mash keys, which I’ll talk about a lot since it’s a big part of our library. Finally, we’re going to discuss Destructuring, which is another data structure within our library that I’ll tell you about.
00:03:37.530
To start out, I wake up one morning and see that there’s a bug report on GitHub. It’s a good bug report, very thorough, so let’s dig into what the reporter found. If we look at their sample code, they start out by subclassing Hash and then mix in something called the Merge Initializer.
00:03:57.030
If you know how Hash works out of the box, you can't easily pass a hash into another hash to make it merge. The Merge Initializer gives you that ability. It looks a little bit like this: we’re going to create our MyHash anew. We’re going to pass it a cat key that has 'meow' and a dog key that has another hash with a name of 'Rover' and a sound of 'woof'. We get what we would intuitively expect from Ruby's standard library.
00:04:27.230
We get that myHash can respond to cat and it gets 'meow', and we get that myHash can respond to dog in the bracket syntax and we get the hash included in there. So that’s the Merge Initializer. But that’s not where a bug lives.
00:04:59.780
The reporter also mixed in the Indifferent Access extension, and this is where the problem lies. If we create a new MyHash with Indifferent Access, we see that we can access the hash with a string key 'cat', just like we can with a symbol key. This is the Indifferent Access portion of the hash.
00:05:12.770
It makes it so you don’t have to remember if you're using string keys or symbol keys, which is particularly useful when dealing with user input from an endpoint or something similar. Intuitively, we see that accessing 'dog' with a string gives the same result as accessing it with a symbol.
00:05:40.130
So far, everyone with me? Awesome! Now, we’re going to create our hash again, and then when we do this, we want to grab the dog hash and merge on the breed. We try to merge on the breed, which is 'blue heeler', and we receive a no method error regarding an undefined method 'convert.' Hmm, I don’t see that anywhere. What’s going on?
00:06:02.030
When we look at the Indifferent Access extension, we see that we have a merge method that calls super and then calls convert on the result. We also have a convert method. So what’s happening? We’ve mixed this into our hash; why do we suddenly not have access to this convert method?
00:06:31.370
We check our hash to see if we respond to 'convert'. I love Ruby for this! We ask if the hash responds to convert, and it says yes. Then we check if the dog hash within the hash responds to convert, and we get true. What is happening? We need to go deeper.
00:07:06.960
So here’s an introduction to two of my favorite tools called Pry and Byebug. Does anyone here use Pry or Byebug? Yes! It makes my life so much easier. When I encounter a bug like this, I often write a failing test and then insert something that looks like this.
00:07:41.490
We’re going to call our merge method, and we’ll call super, and then we’ll tap into super. If you’re not familiar with 'tap', it’s a method on Object. All it does is pass the object you called tap on as the block parameter. Thus, the result of super becomes 'result', while self remains the Indifferent Access hash.
00:08:06.960
Now we have access to both the result and self to figure out what's going on. When we call convert on the result, we then call hash.merge with 'blue heeler', and we get dropped into a REPL (Read Eval Print Loop). Now we can type and interact with the variables.
00:08:37.920
First thing to check is what is self. Just to make sure I know what I’m dealing with: self is a hash, okay, that makes sense. So far, we see that result is also a hash. These should behave similarly. We ask self if it responds to convert, and it says yes.
00:09:17.220
Then we ask if the result responds to convert, and we get false. This makes no sense! They should be the same thing, right? If you’re unfamiliar with the singleton class: the singleton class is the eigenclass or the singleton class of an object at a given moment.
00:09:46.110
When you call extend on an object, you can modify the singleton hash of that object. When you call a method on an object, it crawls up the array of singleton class ancestors and checks if each of those modules has that method. We see that we have the Indifferent Access extension in the ancestors, so that’s why self responds to convert here.
00:10:21.630
When we ask the same question of result, we see that its singleton class has no knowledge of Indifferent Access. Thus, that is the source of our bug: the result doesn’t know how to convert because of this.
00:10:43.290
We need to make sure that the result of merge gets Indifferent Access set on it; without doing that, it’s just a normal hash. Because of the implementation of the merge method, super calls the hash implementation of merge, which is written in C in the Ruby VM.
00:11:01.709
So what we get back when we call super is just a normal hash, even though we’re asking the Indifferent Access extension to give us the result. To fix this bug, we’re going to change our approach. We’ll grab super as a result and then make sure to inject Indifferent Access into the result.
00:11:21.560
This means that the result's singleton class now has Indifferent Access in its ancestors list, which allows it to respond to convert. Once we make this change, we can run our test again, set up our MyHash, and try to access the dog and merge on the breed. It works!
00:11:43.080
So why was this a problem? The source of the bug was the fact that we called super, which used the base class of Hash. When we call super, Hash's implementation of merge runs in that instance. We need to use super to chain multiple extensions together for them to interoperate, but when we come from Hash, the base of super is going to be the Hash class's merge method.
00:12:21.480
Hash has a significant number of public methods (approximately 178), and Aaron Patterson wrote a blog post in 2014 about how too many methods can cause issues, particularly related to a memory leak in Rails when using its Action Controller parameters class.
00:12:43.020
He explained that you need to handle all of those methods because they are part of the public interface of your class. I find it challenging to manage covering 178 implicit methods from a subclass; the likelihood of bugs increases significantly.
00:13:10.080
That was the first problem, which was relatively easy to fix once we knew where to dig in. Let’s look at a second problem involving Mash keys.
00:13:40.780
Another morning, I get a bug report about Mash keys that collide with hash methods, producing strange results. It’s a pretty good report that clearly explains what’s happening. If you use Hashie, you might have seen it show up in your Gemfile and wondered why.
00:14:11.829
Spoiler alert: yes, Hashie has been controversial as I alluded to. In 2014, Richard Schneemann wrote a blog post titled "Hashie is Harmful!" Although he primarily discusses Mash, his criticisms of Hashie provide valuable insights.
00:14:39.440
He noticed that after adding OmniAuth to a Rails application, every endpoint became 5% slower. This performance hit was experienced across the board, not just with endpoints associated with OmniAuth, which is an interesting observation.
00:15:08.750
So, back to Mash. It works a bit like this: if we create a Mash and ask if it has a name property, if it doesn’t return existing, it returns false. We can verify this by trying to fetch 'name' using a method accessor, which gives us nil. This behavior is intuitive and expected.
00:15:54.019
We can set the name of the Mash to 'my mash', and then when we ask for 'name', we get 'my mash'. We can also see that we have a name property set. This is most of how Mash is utilized. It’s also recursive, meaning if you pass a hash key that is a hash value, it gets wrapped in a Mash as well.
00:16:29.839
This functionality is implemented through method_missing, which is powerful yet sharp. The implementation checks if it receives a message it doesn't recognize. If it matches a key, it returns it. If the key has a suffix, it processes accordingly based on that suffix.
00:17:05.249
Mash is intended primarily for JSON responses, as the README states. I have been guilty of writing API client libraries that use Mash for this purpose. After parsing a JSON response into a hash, we wrap it in a Mash and get method accessors for everything. While it seems convenient, nothing bad could happen, right?
00:17:46.350
However, Mash is a hash defined to behave as such. The problem is that Hash has 178 public methods, and some of those methods could conflict with what you return from an API.
00:18:17.290
So when we try to access 'zip', for instance, we get a strange response. It turns out we receive the enumerable 'zip' instead of what we expect. Thus, when Mash has a colliding public method, it behaves unexpectedly.
00:18:44.200
To address this, I created the Method Access with Override extension. You can mix this into your Mash to override conventional methods that may conflict. This gives you control over how a method behaves while retaining access to the original behavior.
00:19:15.230
However, please note that this approach can bust the method cache, leading to performance issues in production environments. It serves as an interesting exploration between conflict management and understanding performance implications.
00:19:47.080
Now let’s tackle another data structure called 'Dash'. A Dash is a declarative hash and offers another layer of control. Once I received a well-documented bug report explaining issues with double splat merging with a Dash.
00:20:06.169
A Dash allows you to define a hash and enforce what properties it can have, thus adding a layer of validation to ensure bad states do not occur within your hash.
00:20:36.800
When we try to double splat this result, the error occurs because the behavior alters from what you expect with a typical Hash. A Dash doesn’t permit undefined properties. However, when trying to merge these properties, certain behaviors break down.
00:21:09.490
Using the RubyVM instruction sequence tool, I learned that Ruby’s VM won’t call to_hash on a Dash since it assumes it’s already a hash, leading to unexpected outcomes with property accesses.
00:21:36.359
This creates a layer of complexity whereby a method you think should exist might not, due to how Ruby interprets certain structural behaviors.
00:22:02.510
To recap, we discussed Indifferent Access and how it allows you to access hashes with strings and symbols interchangeably without loss. We also examined Mash keys and their recursive properties while understanding that they rely heavily on method missing.
00:22:45.449
Finally, we tackled the Dash data structure and its merging behaviors that conflict with established expectations. All three of these issues stem from subclassing Hash.
00:23:07.350
I focus on Hash specifically because I am most familiar with it, but anytime you subclass core Ruby classes, similar issues will arise. Classes such as String, and numerous others exhibit a vast array of public methods many of which are implemented in C.
00:23:30.870
Attempting to override these behaviors may lead to complications you may not anticipate. As I referenced Aaron’s blog where he discusses similar problems in Rails concerning internal classes, these complications could manifest unexpectedly.
00:23:53.290
The key takeaway here is understanding that when subclassing these core classes, you have to contend with a potentially vast public interface and the consequences that can arise from that.
00:24:13.440
Before I close, I have a quick additional piece. My co-maintainer in Hashie, DB, has chronicled everything that has gone wrong with Hashie Mash. His entertaining blog post recounts a series of mishaps through the years.
00:24:31.740
In our library of Hash and Mash, when you look at how they interact with the RubyGems database dump, one percent of the top 1000 gems depend on Hashie. Out of the top gems, their functionality predominantly relies on Mash.
00:25:00.650
So, my parting PSA—question the necessity of Hashie Mash in your applications. More often than not, you can parse a JSON string directly through standard methods and work with the data directly.
00:25:36.080
Replacing Hashie Mash with OpenStruct or similar structures can lead to more straightforward implementation and significantly reduce overhead. Also, if you need recursion, OpenStruct can deliver that extensibility.
00:26:09.260
If you’d like to work together, please reach out to me at Flywheel. My name is Michael Herold, and I’d be happy to discuss further questions here or online. Also, if you're interested in contributing to Hashie, please contact me!
00:26:39.080
Thank you for your attention! I appreciate your time.