Immutable Ruby

Talks

Immutable Ruby

Michael Fairley

#functional-programming

#software-design-patterns

Immutable Ruby

by Michael Fairley

The video titled "Immutable Ruby" presented by Michael Fairley at MountainWest RubyConf 2013 focuses on the concept of immutability in Ruby programming. The speaker begins by defining immutability as the inability to change data once it is created, contrasting it with mutable state, which can lead to maintenance problems and unwanted side effects in code. Fairley highlights several key points regarding the advantages and practices associated with immutability, elaborating on the benefits of explicit immutability in application design and code maintenance.

Key points discussed include:

- The Problem with Mutability: Fairley shares his experiences with mutable code leading to complications, emphasizing that many data elements in code are often expected to be immutable, such as purchase records in e-commerce applications.
- Implementing Immutability: He suggests a simple ActiveRecord mix-in to enforce immutability at the model level and explains the distinction between immutable and mutable fields within a model.
- Value Objects: Fairley introduces the concept of value objects as entities where equality is based on data rather than identity. He demonstrates how to implement value objects in Ruby and how they can lead to cleaner, more maintainable code.

- Event Sourcing: The speaker explains event sourcing, where all changes in an application are recorded as immutable events, using a bank account example to illustrate the advantages of tracking changes over time without directly altering state.

- Benefits of Immutability: Immutability leads to code that is easier to reason about, enhances multi-threading safety, reduces the need for locks, and can eliminate cache invalidation issues.

- Performance Trade-offs: Fairley addresses the performance implications of immutability, such as higher memory consumption and allocation costs, alongside the flexible nature of Ruby that complicates enforcing immutability.
- Persistent Data Structures: The discussion includes a brief introduction to persistent data structures that allow for the creation of new versions of data without modifying existing data.

In conclusion, Fairley advocates for embracing immutability in Ruby for cleaner, more manageable code that adheres to established principles of software design. He presents additional resources for further learning about immutability and value objects, encouraging developers to explore other programming languages that natively incorporate immutability. The overall message emphasizes that while there are trade-offs, the advantages in code clarity and stability make immutability a valuable practice in Ruby development.

00:00:20.519 So we learned yesterday that Comic Sans is one of the more readable fonts. I went home and immediately changed my entire slide deck to be in Comic Sans. Now, I'm actually just trolling with that. My name is Michael Fairley, and I work at Braintree Payments, where we make it easy for businesses of any size to accept credit card payments online. We work with thousands of really awesome merchants, including some you probably recognize. So if you're interested in talking about payments, come find me later.

00:00:52.000 Today, I'm going to be talking about immutability. Simply put, immutability is when data can't be changed. In Ruby, this refers to an object that can't be modified after you create it, but it could also be data in your database that you only ever insert; you never update or delete it. I've had some bad experiences with mutable code in Ruby over the past few years and some good experiences with immutable code. I'm going to give you some examples of pitfalls I've run into and how I've worked around them.

00:01:25.560 The first point is that you already have lots of data in your code that you expect not to change, but that expectation is very implicit. I think you should make some of this explicit. For example, let's say you have a purchase model that records who it was, what they bought, and how much they paid when a user buys something on your site. You probably never want any of these attributes to change once the user made the purchase. Even if you go back and change the price in one of these models, it won’t change how much you actually charged their credit card. If you change the item ID, it won't actually change which item you shipped to them. This is a fairly common pattern where you have data representing events that happened in an external system or in the real world, and you want to ensure that these are immutable.

00:02:18.040 It would be nice if we had a module that we could mix into our ActiveRecord models to make them immutable. Luckily, that's just five lines of code. ActiveRecord calls the read-on method anytime it tries to save or update a record, and if it returns true, it doesn't actually update it. Here, we just say that any record that's already been persisted is read-on. As a result, we can create records just fine, but when we try to update them, we'll get an error, and our code will enforce the behavior we want.

00:02:40.560 If we are in a script console in production and have a momentary lapse and forget that we're not supposed to tweak this data, the system will stop us, and that's what you want. You could even take this a step further and enforce these constraints at the database level. But what if we have some fields that we want to be able to change and some that we don’t? For instance, I've added a 'status' field that will progress from 'processing' to 'shipped' to 'delivered', and perhaps 'refunded' later. We want that field to change, but we still don’t want the other three attributes to change.

00:03:24.720 Luckily, there's a gem for that. We can mark these three attributes as immutable, and if we ever try to update them, we’ll get an error. You also run into this problem in your code, not just in your database. For instance, there’s a method that lets you take a domain name, a path, and some optional query parameters, and it builds the URL for you. It handles extra slashes or too few slashes to make sure they work out.

00:03:42.120 So if, for example, you’re using 'example.com' and 'blog', it simply concatenates them together, or if we pass it a hash, it will serialize that into the query parameters. Let’s say you realize that you’re using the same domain name over and over again, so you extract another helper to simply use that domain name for your blog URL. However, this hardcoded string is kind of ugly; it’s configuration that’s been hardcoded into your app. You want a way to tweak it, so you pull it out into a constant. This could also be an environment variable or some data from a YAML file you loaded.

00:04:01.640 With the blog URL, everything looks fine, but then the sorted photos now have 'blog' in their path, which is a little funny. Let’s try that again; that's not quite what we wanted. Something has gone very wrong here. Originally, when the string was hardcoded into the method, every time 'build example URL' got called, we generated a new string that was 'example.com'. Once we extracted this constant out, only one instance of this string was created; each time we called 'example URL', we were passing the same string into 'build URL'. The 'build URL' method was destructively adding to the end of it. The fix for this is pretty simple: you use '+=' instead of '<<', which won’t modify the base URL.

00:05:56.480 However, whoever did the refactoring to extract the string into a constant wasn’t paying attention to 'build URL'. They didn’t realize that this string was going to be mutated later—this wasn’t something they thought about or should have to think about. What this person could have done was be a little more defensive and freeze the string ahead of time. If the test suite ran, it would blow up, saying you tried to modify the string that you intended to never be modified.

00:06:25.840 I think this is a pattern that you should be using for configuration, constants, and other things that you plan to never modify, just to ensure that when it gets passed around—maybe nine methods down a call stack—someone doesn’t realize that the string was supposed to be immutable. There’s one problem with 'freeze' which is that if you freeze an array, it won’t freeze the elements inside of it; or if you freeze an object, it won't freeze that object's instance variables. There's a gem called Ice 9 that adds a 'deep freeze' method which recursively freezes an entire tree of objects.

00:06:49.360 So, when you use 'YAML.load_file', you might want to call deep freeze on it to ensure that those configuration variables don’t get changed as your application runs. Next up, let's talk about values. Conceptually, values are objects whose identity is based on the data inside of them rather than some sort of external identity. An ActiveRecord object is not a value because even if two ActiveRecord objects have all the same values except for their ID, they won’t be considered equal.

00:07:21.360 Values you're probably used to working with are things like numbers and time, where you don’t really care if the Ruby interpreter has two different objects that represent the same time; you just care that they’re equal. But there are more values than just the ones Ruby gives you. There are things like addresses, Cartesian points, and URIs. There’s a gem that helps you build your own value objects and use them in your system. It works a lot like structs. For instance, we’ve created a new class called 'Point' that has attributes X and Y. We can build a point at the origin and retrieve the data we expect.

00:08:14.160 You can create another point elsewhere, and we get the data we expect. But unlike a struct, we can’t actually change the data once it’s placed inside. As I mentioned earlier, equality is based on the data inside of it and not any sort of identity. Structs work similarly, but with values, we know that once they’re equal, they'll always be equal. You can also add a little bit of behavior to values.

00:08:48.680 For example, you can create a method to print a value more prettily, or in this case, scale the point. However, these methods can’t modify the value. So what’s the benefit of using values? The mantra 'skinny controller, fat model' has been around in the Rails community for close to seven years now, and while it’s great that we’ve gotten all this logic out of our controllers, some of our models have become a bit too fat.

00:09:07.440 You probably have a user model in your application that has hundreds of methods and thousands of lines of code. Value objects are a great way to decompose these excessively fat models. Let’s talk about a hypothetical user model that has, among other things, some shipping address information. The user model also has a method to calculate the cost of shipping an item to that user.

00:09:55.360 The first thing we do is create a value that can represent this address with the same four fields that we saw in the user model. We can then use the ActiveRecord compositive helper to map the database fields to those used within the value object itself. This means we can create a method called 'shipping_address' that returns an address and define the mapping from the ActiveRecord database field names to the names used within the value object.

00:10:26.799 When the data gets set on the model as it comes out of the database, we can ask for the shipping address and it will give us this value object. Additionally, we can also assign a value object to the user's address, and it will decompose it onto the constituent fields. This approach minimizes some of the pain experienced with calculating shipping prices.

00:10:53.440 The original implementation of 'user.calculate_shipping_price' actually led to some pain evident from our tests. We had to require a spec helper which required our application, then required all of Rails, and this made our tests take about thirty seconds to start. You could see that there were hundreds of other tests in these files, which indicated we were violating the single responsibility principle.

00:11:24.760 Finally, in our tests, we had to use tools like FactoryGirl and insert data into the database just to set up our tests. If instead, we moved the 'calculate_shipping_price' method to the address, our test's only dependency would be the address model, eliminating the need for FactoryGirl or to interact with the database. Even if there are still dozens of tests here, this method or model isn't excessively bloated.

00:12:04.760 Not only that, but it makes us resilient to changing requirements as the application evolves. For instance, what if a user has more than one address or needs to ship to businesses as well? With the original code, it’s not obvious how we could make those changes.

00:12:39.760 However, with the logic decomposed into a value class that can be shared, it’s clear what to do. Perhaps we want to differentiate between different shipping prices for different items. We could move the 'calculate_shipping_price' method onto the item and have it take an address as a parameter.

00:13:27.280 This way, if this method was bound to user instead, it wouldn't have been as clear how to adapt to those changes. There’s a pattern here where having different pieces of your system communicate with value objects offers greater flexibility.

00:14:01.040 For example, if you add a checkbox to your website asking, 'Use my billing address as my shipping address?', you would eventually need code that looks like this. If you ever added a new field to the address, like 'country', you might not remember to update this. However, if you’re using value objects, you can simply assign one to the other, and it’s resilient to change.

00:14:36.320 Next up is event sourcing. Event sourcing captures all the changes to your application state as a sequence of immutable events. To explain this, let’s use the example of a bank checking account. You open an account, deposit $1,000, and your balance reflects that amount. Later, you buy a conference ticket, which reduces the balance. After receiving a paycheck, your balance increases again, and if you buy a book and later return it, the balance reverts to what it was before.

00:15:38.920 In this system, the events are the debits and credits against your account—essentially the five transactions I've mentioned. The derived state is the balance. Notice that if we take away the balance, each of you could recompute it based on the observable events. The derived state, or balance here, is what you're primarily concerned with when you open your bank account.

00:16:44.440 You generally want to understand how much money is available for spending, while the transactions represent the events leading up to that state. Given this log of events, we can ask interesting questions, such as what was account 11's balance seven days ago? By examining the leading events, we can answer that question easily. Events are immutable, meaning that when we revert to a previous state, rather than deleting a record of a transaction, like the purchase of a book, we insert a new event showing that money has been returned.

00:17:34.040 Furthermore, events can be replayed at a different time or place. If an error occurs in production at a bank, developers could refer to the event log happening during the incident and replay that log locally until the error's occurrence. This way, developers view the system in the exact state when the error happened. Similarly, when rolling out a new version, developers could replay events against both the old and new code versions to ensure they agree at every step.

00:18:17.840 Most of you likely utilize another event-sourced system daily: Git. In Git, the events are commits. The derived state from these commits is your working directory, which allows you to reconstruct past states or ask what your working directory appeared like at a specific commit. When reverting an event in Git, rather than undoing it, a new event is added that indicates the opposite action. When rebase occurs, it’s similar to event replay where you take events from one branch and apply them to another.

00:19:03.960 I once worked at a family history startup where we built a family tree feature. We decided to event source all the changes made to this family tree. Anytime someone added a family member or adjusted a relationship, we recorded the event. I’ll share some of the benefits from this decision. You can view family trees as a graph where individuals are nodes and relationships are edges. Until this point, we used Postgres for data storage, which is rock solid. However, relational databases struggle significantly with graph data.

00:20:13.760 Queries become convoluted and sometimes multiple trips to the database are necessary to effectively execute a query. In contrast, many new NoSQL graph databases are structured for efficiency regarding querying graphs and traversing them, which was precisely what we needed for our family tree. However, we didn't trust our ability to operate this new database.

00:20:52.480 Consequently, we decided to record the event log in Postgres and store the computed state of the graph in that database. That way, should anything happen to the graph database, we still had a canonical copy of all our data within Postgres. Furthermore, we had an audit log due to the event sourcing. Occasionally, malicious users would either inject false information or remove existing information from a family tree. When this happened, we simply had to identify each of the events triggered by those users and revert them.

00:21:49.760 Eventually, we decided that this family tree feature wasn't critical to the business's viability and that managing the graph database outweighed its benefits. Therefore, we moved everything back into Postgres. However, data migration from one database to another can be challenging, especially if they lean on different data models—like a graph database and a relational store. Instead of directly copying the data from the graph database into Postgres, we modified our application code to write events into Postgres and replay the entire event log.

00:22:39.040 At that moment, both databases had the same computed values, and we subsequently turned the graph database off. You might find that event sourcing is an outstanding solution, granting you significant advantages, like being able to store computed state in memory for rapid queries while maintaining durable records of everything that transpired. There are countless principles in software engineering and computer science that immutability can help you navigate around.

00:23:13.240 One famous phrase notes, 'There are only two hard problems in computer science: cache invalidation and naming things.' When all the data you’ve cached is immutable, cache invalidation isn't a concern. You may have observed this with the Rails asset pipeline—whenever an asset version changes, Rails appends a version hash. This guarantees that any requests against that URL will always deliver the same data, enabling the browser and proxies to cache it for an extended period.

00:24:00.760 Database normalization serves as another fascinating concept. Consulting Wikipedia reveals that its main objective is to isolate data so that additions, deletions, and modifications can occur in a single table. If you're never updating or modifying data across multiple tables, there's no real need to normalize it; you can keep your data denormalized. Also, immutable objects are inherently thread-safe. Value objects and persistent data structures can be shared among threads without concern for needing any locks.

00:24:53.560 That said, there are trade-offs when choosing to use immutability. One significant drawback is performance; using immutable data structures and objects leads to more frequent allocations and copying of data, resulting in slower execution and higher memory consumption. You sacrifice some flexibility for immutability; it's a constraint you impose upon yourself, which can be challenging when your performance requirements or libraries don't interoperate well with immutability.

00:25:36.000 Additionally, Ruby, as a language, doesn't promote immutability; you can reassign constants and directly change other objects’ instance variables. There’s even a gem that allows you to unfreeze a frozen object. The flexible nature of Ruby can make imposing constraints more challenging.

00:26:08.760 Finally, deletion is another form of mutation. Consider Twitter—tweets are conceptually immutable; once posted, they can’t be modified. Therefore, it should theoretically be possible for Twitter to cache tweets indefinitely. However, users can delete their tweets, leading Twitter to deal with the complications of cache invalidation.

00:26:39.760 If any of this piques your interest, I have pointers to several next steps. The URL in the bottom corner will list everything I’m about to mention and will remain available throughout the remainder of the presentation. I suggest learning one of these three programming languages that have immutability baked into their syntax, making changes challenging. While they may not prove useful for your daily work, studying them will enrich your understanding of immutability and make you a better Rubyist.

00:27:18.600 Rich Hickey, the creator of Clojure, has presented many outstanding talks in recent years about immutability. I touched on value objects for five minutes, but he has an exceptional 90-minute lecture on value objects that is incredible. Two valuable resources on value objects include 'Domain-Driven Design,' which discusses how to effectively have value and mutable entity objects interact within an application, and the C2 Wiki, which offers excellent articles about why you want them to be immutable.

00:28:00.840 Gary Bernhardt has discussed the idea of a functional core with an imperative shell where all your domain logic lives in pure functional, entirely immutable components, joined together by a larger mutable shell. He shared this concept during a talk, 'Bound Trees,' at RubyConf last November, where he delves deeper and provides fantastic examples. Lastly, Martin Fowler's canonical text on event sourcing outlines numerous use cases, reasons to employ event sourcing, and the potential pitfalls to anticipate.

00:29:17.440 That's all I have for you. Thank you!

00:30:05.760 How much time do we have for questions?

00:30:12.000 Great! I have a bonus round if we have about five minutes.

00:30:17.760 I mentioned persistent data structures earlier. Persistent data structures are immutable data structures where, when you try to change the data inside, rather than modifying it, it returns a new copy with those modifications applied. There's an amazing library in Ruby called Hamster, and I believe similar features are integrated into Rinus.

00:30:40.760 As an example, we have a vector called Fu containing the numbers 1, 2, and 3. When we add a fourth element to it, we assign it to variable bar. Now, bar contains the modification we just made, while Fu remains unchanged. If we attempt to change one of the elements in the vector, bar will reflect that change, but Fu won’t.

00:31:32.000 To illustrate its usefulness, let’s look at the Three Stooges. In one movie, called 'Soup to Nuts', the cast included Mo, Shm, and Larry. In the following movie, 'Meet the Baron', Shm leaves and Curly replaces him. Eventually, in 'Gold Raiders', Shm returns and Curly leaves. If we share the same array or set among these movies and change it for the second one, the cast from the first movie would unintentionally be altered. However, with persistent data structures, each film retains the exact cast we expect.

00:32:34.000 You might ask if Hamster uses 'dup' a lot to achieve this. Actually, some interesting computer science principles are at play here. We have a vector of the numbers 1 through 7. The leaf nodes of this tree represent those numbers, traversed in sequence. The variable 'old' points to the tree's root. When we add the number 8 to the vector, a new tree is created, and if you traverse the new tree, you’ll retrieve 1 through 8, while 'old' and 'new' share almost all of their nodes.

00:33:16.640 This sharing of nodes allows for greater efficiency, speed, and lower memory usage than performing a complete copy. Now, I’m genuinely done.