The Humble Hash

Ruby Standard Library

The Humble Hash

Ariel Caplan

#ruby-standard-library

The Humble Hash

by Ariel Caplan

In the video titled "The Humble Hash," Ariel Caplan explores the versatile and powerful capabilities of Ruby hashes, illustrating their significance beyond simple key-value storage. During his talk at RubyConf 2020, he draws comparisons between hashes and a Swiss Army knife, showcasing how they can perform various tasks efficiently.

Key Points Discussed:
- Introduction to Hashes: Caplan defines a hash as a performant key-value pair store, highlighting Ruby's unique feature that allows any object to be used as a key or value.
- Creating and Fetching Hashes: He explains different methods for creating hashes and fetching values, including using literal syntax, hash.new, and the significance of default values and procs to enhance functionality.
- Example of John Conway's Game of Life: Caplan uses this algorithm as a practical example, demonstrating how utilizing hashes can optimize performance in programming implementations. He compares two Ruby implementations of the game, highlighting the efficiency gained by using hashes in one over the other.
- Use of Sets with Hashes: He discusses the Set class in Ruby's standard library as an example of how hashes enable fast, unique item inclusion, showcasing practical applications in real-world scenarios.
- Default Values in Hashes: Caplan emphasizes the importance of default values in hashes, explaining how they function without creating new key-value pairs. He warns about potential pitfalls with mutable default values.
- Recursive Functions via Default Procs: He explains how hashes with default procs can efficiently handle recursion using caching techniques, illustrated through calculating Fibonacci numbers.
- Conclusion and Applications: The presentation wraps up with real-world applications of these concepts, such as the Abbrev library's use of hashes to handle string abbreviations efficiently, as well as Caplan’s experiences at Cloudinary in leveraging hashes for performance improvements.

The conclusion of the talk underscores the flexibility and dynamic nature of Ruby hashes, encouraging participants to leverage their properties to write efficient, clean code. Caplan invites audience interaction and questions, promoting further discussion about hashes and their applications in programming.

00:00:00.080 Hey RubyConf! My name is Ariel Caplan, and I'm here today to open your eyes to some powerful secrets of Ruby hashes.

00:00:05.520 Now, however long you've been programming with Ruby, you've surely come into contact with a hash. You probably think of it as being somewhat analogous to a pocket knife. It's a really good tool that has a particular use and does that one thing very well.

00:00:12.960 In reality, hashes in Ruby are more like a Swiss Army knife. They can do a whole lot of different things and actually excel at many of them.

00:00:31.119 To explain what I mean, I need to talk to you a little bit about life. Now, when I say life, I don't really mean the concept of life; I'm talking about John Conway's Game of Life.

00:00:50.960 This game is mostly interesting for mathematical purposes, but over time, it has become of great interest to programmers as well. It's a game with a few simple rules that allows you to try to program it as a way to learn a new language, framework, or concept. By implementing its rules, you can learn a lot about what that language has to offer.

00:01:12.160 So, let's dive into what this game is, and then we'll see how it relates to the topic of this talk. The idea is that you divide the world into a large, basically infinite grid of squares, with each square assigned an initial state of being alive or dead.

00:01:28.880 In this case, we've marked the alive squares in blue, and the dead squares remain empty, white. That's our initial state. Next, we calculate the next step by looking at each cell and its neighbors, deciding based on a few simple rules whether it will be alive or dead in the next round.

00:01:47.439 If the cell is currently alive, we check how many live neighbors it has. If it has two or three, it continues to live; otherwise, it dies. If the cell is currently dead, it remains dead unless it has exactly three live neighbors, in which case it is spawned into a new cell.

00:02:30.000 For example, consider the cell in the bottom left, which currently has one live neighbor—this is insufficient, so it will die. The cell in the top left, surrounded by two live neighbors, will continue to live. However, the dead cell in the bottom right, despite having two live neighbors, needs three to come alive, so it will remain dead. Conversely, the cell on the top right has exactly three live neighbors and will come alive in the next round.

00:03:00.400 If we do this calculation for all the cells in our system, we can see how patterns evolve over generations. If you pay attention, you might notice that after a few steps, we can end up back where we started.

00:03:31.600 I wrote two different implementations of John Conway's Game of Life in Ruby, and while they are almost exactly the same—the same logic and the same algorithm—there is a subtle difference between them. On the left, we have a set of gliders moving along nicely. I made it a looping world, so when it goes off the bottom, it comes back on top. On the right, you’ll see things moving more slowly after a few steps as the algorithm gets bogged down.

00:04:05.040 The difference between the left and right implementations is one simple thing: the one on the left makes good use of hashes while the one on the right does not.

00:04:50.399 But first, let me talk to you a little bit about myself. Hi everyone again! I'm Ariel Caplan, and I've been a Ruby programmer since I graduated from the Flatiron School in 2014. I currently work as a back-end developer at Cloudinary.

00:05:07.039 This is not my first Ruby job; I've worked at a number of Ruby-focused companies along the way, which has been quite enjoyable. At Cloudinary, I'm working on one of the world’s biggest Ruby applications, dealing with billions—yes, billions with a 'b'—of requests for images and videos every day.

00:05:53.680 Basically, if you use images and videos in your online application, we probably have some cool things to help you, including ways to optimize your website for display on mobile, faster load times, and more.

00:06:22.080 You can find me online as @amkaplan everywhere that matters, so feel free to tweet at me. This talk is pre-recorded, but I will be happy to respond to your tweets during this presentation. Additionally, you can find the slides for this talk at amkaplan.ninja if you’d like to follow along.

00:06:42.759 Now, let’s get down to the basics. So, what is a hash? At least, what is the Ruby edition of a hash? A hash is a performant key-value pair store. Each word there is really important. Store—it's a data store for pairs of keys and values. You set things and get things based on keys, and it's performant, meaning the time it takes to get or set a key-value pair remains consistent regardless of how many pairs are inside your hash. Whether you have three or 30 million, the time is effectively the same.

00:07:38.000 Importantly, and this is fairly unique to Ruby—after exploring other languages, I couldn’t find anything else with this property—is that keys can be any Ruby object; values can also be any Ruby object. They can be strings, symbols, numbers, Active Record objects, or even random standard library classes! That said, while you can use any object, it doesn’t mean you should.

00:08:31.440 Now, to create a hash, you can use literal syntax, which uses curly braces to contain the contents, or you can use a more explicit syntax with the hash.new method. There are multiple ways to fetch values from a hash as well.

00:09:13.600 You can use hash.fetch, which raises an error if that key is not part of your hash—this is a stricter way of fetching. Alternatively, you can use square brackets, albeit some confusion exists regarding this. While some think it’s lenient, it actually uses a default value if the key isn't found. If the key exists, you retrieve its value; if not, you get the hash's default value, which defaults to nil, though you can customize it.

00:09:59.440 For instance, if I want to create a hash with a default value of four, I can pass that as an argument to hash.new. More on default values in a moment, but first, let's explore the concept of object keys.

00:10:45.680 We're going to look at a couple of standard library classes using hashes as we go through this talk. Let's consider the Set class, as it's a great illustration of the power of letting any Ruby object be a hash key. Set is part of Ruby's standard library. You don’t need any gems, but you must require it since it's in the standard library.

00:11:51.440 You can create a new set by calling Set.new, typically by passing it an array, where each element will become a member of the set. You can then check if the set includes a specific element. If it does, it will return true; otherwise, it returns false. The cool part about this include method is that it doesn't matter how many items are in your set—whether there are just a few items or tens of millions—it operates quickly.

00:12:20.160 You can add an item to your set, but if you add an item that's already there, it won't duplicate it. Over time, these are the two properties of a set: the include operation is very fast, and it only allows unique items.

00:12:57.760 Let’s look at the implementation to understand these properties better. When we create a new set, we initialize it with an enumerable (like an array) and create an internal hash set to a new hash with a default value of false. From there, we merge our array, meaning we iterate over it, executing the add operation, which simply sets a key-value pair in the hash where the object being added is the key, and the corresponding value is true.

00:14:06.640 When we call include, we are checking if that object is present in the hash. If it is, it returns true because it’s stored as true, but if it isn't, it returns the default value, which is false. This is a simple yet powerful implementation.

00:14:39.760 Now, let's look at a practical example in my first job, where one of our products allowed users to search for healthcare providers. When the search results were returned, certain disclaimers should have been displayed based on the results. If any of the providers matched a certain condition, we displayed a disclaimer text advising that the user should choose that provider to incur lower out-of-pocket costs.

00:15:40.080 Imagine we have disclaimers set up in a key-value format, alongside provider metadata that contains a variety of information for determining the appropriate disclaimers to display. Previously, the approach taken to match disclaimers with provider metadata was not very performant, requiring an outer loop for each disclaimer and an inner loop for the providers, resulting in a substantial quantity of operations.

00:16:25.520 What I suggested instead was to create a set of provider data and iterate over the key-value pairs in the provider metadata to add each unique one into the set. This way, we could quickly check if a disclaimer was relevant by seeing if a key-value pair was present in our set, significantly decreasing the number of operations needed.

00:17:39.920 However, it’s important to be cautious when using different kinds of keys in your hashes. Stick with simple objects; avoid anything too complex. Strings, symbols, and numeric types are generally fine, but be careful with floats, as they may not compare as you expect. Avoid using complex objects like Active Record instances or arrays, as these can lead to hard-to-trace bugs.

00:18:22.400 Now, let's move on to discuss default values in hashes. A default value allows you to retrieve a specific object any time you reference a key that isn't found. It's crucial to note that default values do not create a new key-value pair in the hash—they simply return the default value if the key isn't found.

00:18:46.479 While setting a default value, remember that it will always refer to the same object. If the default value is mutable, such as a string, altering it will change it for all keys that return it. To avoid unexpected behaviors, stick with immutable objects like symbols or numbers, or freeze mutable objects before using them.

00:19:56.960 Next, let’s discuss default procs, which are similar to default values but instead of returning a preset value, they allow you to determine the value to set if the key isn't present. You can choose to set a key-value pair when running the proc, or you can return a value without setting a key.

00:20:57.600 One interesting trick is that you can assign a default proc a name, allowing it to reference itself for creating arbitrarily nested hash structures. When accessed, it automatically creates a new hash with the same default proc.

00:21:45.600 Real recursive functions can also be created using hashes with default procs. For example, we can compute the Fibonacci sequence recursively using a hash, where the default proc calculates Fibonacci numbers based on previous computations stored in the hash.

00:23:04.720 This approach allows us to calculate Fibonacci numbers quickly, as results are cached inside the hash, preventing unnecessary recomputation. This method demonstrates the efficiency of working with hashes in Ruby.

00:24:12.399 Now, returning to Conway's Game of Life—my implementation cleanly separates the logic from the display code. The display code iterates over the grid, providing each cell's coordinates and the current generation, while the logic determines whether each cell is alive or dead.

00:25:27.919 In a consistent implementation, the logic code checks the status of each cell and its neighbors, applying the rules of the game. But instead of relying on a cumbersome recursive method, we can utilize a hash with a default proc to simplify the recursive structure.

00:26:07.040 This hash can store every cell's state across generations, again demonstrating how Ruby hashes allow efficient recursive access to previously calculated states.

00:26:48.960 While we see increased memory usage due to caching, the trade-off is generally worth it for the performance improvements gained by avoiding repeated calculations.

00:28:07.680 Before concluding, I want to introduce the Abbrev library, which takes an array of strings and returns unambiguous abbreviations for them. Graphically, it builds a hash to keep track of how many times it sees each abbreviation, adding them to a final table that only includes abbreviations seen one time.

00:29:45.919 This library uses the built-in behavior of default values, simplifying the counting process without requiring initialization for every possible abbreviation. This pattern is particularly useful in application code, as it allows for more concise handling of counters.

00:31:02.239 In our code at Cloudinary, we use the same strategy to manage totals effectively. We create hashes with appropriate default values to seamlessly compute totals while maintaining a predictable structure.

00:31:38.159 In conclusion, we've covered how flexible Ruby hashes can be, the dynamic nature imparted by default values and procs, the potential for recursion-like implementations, and various caching strategies that leverage Ruby’s unique properties.

00:32:04.600 Thank you for listening! I hope you've learned something new today. There will also be a Slack chat later, and I look forward to hearing your questions. You can find me at @amkaplan. Enjoy the rest of the conference!

RubyConf 2020