00:00:12.300
Hi everybody, the title of this talk is "Caching Without Marshal." My name is Chris Salzberg, and I'm a Staff Developer at Shopify. The background for this slide shows a city called Hakurate in the north of Japan, where I live. I'm part of the Ruby and Rails infrastructure group, which has many members here at the conference. I am in a smaller team of about five developers called Core Stewardship.
00:00:22.800
Our role at Shopify is to steward the core monolith, which is likely the largest Rails application in the world. We need to keep it maintainable, ensure the code is clean, and create a happy environment for developers at Shopify.
00:00:30.180
About a year or two ago, we encountered an exception in production that triggered an incident requiring a rollback. While I wasn't directly involved in the incident itself, I participated in the follow-up. The exception was a name error regarding a constant called "betaflex service active record repository."
00:00:43.440
Whenever you ship code, you deal with two universes: the universe before you ship the code and the universe after you've shipped it. Usually, most situations allow these two states to overlap, with only a bit of change. However, in this case, the change was a refactor of a commonly used part of the code concerning beta flags.
00:00:51.660
This incident was tied to another issue, which again was tied to another related issue, and all of these were placed in the cache. Then, navigating through the new universe revealed that this item was removed from the cache. When this occurred, the old universe did not recognize the item, leading to further complications.
00:01:04.440
This situation is particularly problematic because there are really only two ways to handle such issues if you’ve faced them before. One way is to put fewer things into the cache or to limit the variety of items you cache. However, this tactic won't stop the occurrence of such problems.
00:01:15.540
The other alternative would be to change the code less frequently, but that’s not an acceptable solution for our team. Our goal is to make code better and cleaner, and we don't want to prevent developers from refactoring their work. Therefore, we decided we have to dig deeper to solve this problem.
00:01:30.180
Let’s first talk about caching.
00:01:35.400
If you visit the Rails guides, you'll find a lot of great information about caching. There are various topics like page caching, action caching, fragment caching, and Russian doll caching. You can discover different places to store your cached data, whether it’s in files, memory, Memcache, or Redis.
00:01:43.440
However, what you won’t find much information about is what you can or cannot cache—what is allowed to go into the cache. In fact, if you read about low-level caching, there’s a sentence stating that the Rails caching mechanism works great for storing any kind of information.
00:01:50.700
This statement seems somewhat too good to be true, and it ultimately is. The primary functionality for caching in Rails boils down to writing or reading from the Rails cache, which is an instance of Active Support Cache. Typically, this involves calling something like Marshal.dump or Marshal.load on an object, which could be almost anything.
00:02:07.260
While there are some caveats, that's the gist of it. We utilize a different data store, namely Memcache, but the story here remains largely unaffected as it defaults to using Marshal as well.
00:02:18.300
When you cache data, it actually wraps the data with some metadata, comprising expiration and an optional version. This entire package is placed into something called an Active Support Cache Entry. There was a change between Rails 6 and Rails 7: prior to Rails 7, this whole Active Support Cache Entry object was dumped by Marshal, making the process inefficient.
00:02:36.480
In Rails 7, this process became more space-efficient due to the work performed by Jean Lucia.
00:02:41.760
Nonetheless, many issues associated with Marshal persist. Everything I'm discussing today still applies in Rails 7 but with increased efficiency.
00:02:49.200
The problems associated with Marshal are well-known, and it's no secret that Marshal.load is not recommended as a universal serialization format. You should never unmarshal user-supplied data or other external data. If you doubt my statements, there was a known vulnerability (CVE) in 2020 related exactly to Memcache and Redis cache stores.
00:03:05.460
Having items in the cache can inherently be dangerous for these reasons. You might be curious about what exactly happens inside this sharp knife known as Marshal.
00:03:13.260
If you attempt to investigate the code for Marshal, you may be disappointed since it's not written in Ruby—it's written in C. There's little documentation explaining how Marshal actually works.
00:03:20.100
I want to discuss this primarily because before removing Marshal, you must understand its functions; otherwise, it might cause issues in the future.
00:03:28.500
Here’s an overview: we'll use a simple Rails record, specifically a post model, creating a record titled "Caching Without Marshal."
00:03:36.960
When I executed this, it produced a binary blob approximately 1600 bytes long. Within this blob, you can find constants and instance variables—it even displays "Caching Without Marshal" multiple times.
00:03:43.800
This isn't particularly efficient. However, the magic of this approach allows us to pass this blob to Marshal.load, and voilà, you receive the original object exactly as it was.
00:03:51.060
Moreover, you can reconstruct the object in different threads, processes, or even weeks, months, or years later, and it will be identical.
00:03:58.680
Marshal encodes the universe; it captures everything and disregards concepts like privacy. This setup is fantastic if our universe were static, yet we typically ship every 30 minutes or an hour.
00:04:06.060
Given this non-static universe, using Marshal poses significant risks. This concern is relevant for all Rails applications.
00:04:15.900
So what actually happens when Marshal encodes? You can find much of this in the Marshal C code within MRI Ruby. The top sections contain constants indicating what Marshal is doing.
00:04:24.600
The first thing you'll spot is a major and minor version number, which at the moment is 4.8; this setup hasn't changed for many years, so you can generally treat this as a constant.
00:04:31.560
Then there are a number of what I refer to as atomic types—this is my terminology, and it's not official. These atomic types include nil, true, false, numbers, floats, symbols, and class and module objects.
00:04:39.600
There are also composite types—again, I use this term. This group includes arrays, hashes (even default value hashes), and objects. There are also unexpected types such as strings and regexes.
00:04:47.400
Marshal has specific approaches for encoding these, and some types are indeed somewhat mysterious. Discussing these aspects is also vital.
00:04:56.760
To start with objects, how does Marshal encode them? It’s relatively straightforward. Objects have a type called "type object," represented by a small character 'o'.
00:05:05.760
If we take that byte string I mentioned earlier and convert it to hex for better readability, the first thing visible is the version I referred to earlier; every Marshal-encoded entry starts with that.
00:05:12.120
Then it follows with the byte 'o' for object, the class name of the object as a symbol (indicated by a colon to signify it’s a symbol), its length, instance variables count, and then the instance variables themselves.
00:05:20.280
You can efficiently keep track of instance variables, and every variable can also host other objects. This rapidly increases the complexity.
00:05:30.420
Another crucial concept is that Marshal encodes instance variables. While we've identified instance variables in regular objects, they also apply to specialties like strings, regexes, hashes, and arrays.
00:05:39.240
Specific functionality facilitates assigning instance variables to these types. Interestingly, Marshal can handle circular references—Active Record objects often have relationships pointing to other records.
00:05:48.720
For example, circular structures can emerge from records with associations. Thanks to its design, Marshal can encode these circular elements without resulting in an endless loop or segmentation fault.
00:05:57.540
To circumvent infinite recursion, Marshal employs a type link technique—it essentially uses a pointer to signify instances already in serialization.
00:06:06.540
Another consideration is how Marshal deals with core type subclasses.
00:06:13.080
If you subclass a core type, such as a hash, when encoding it, Marshal can recognize the class, but it employs a special type called 'u-class,' represented by capital 'C,' which allows it to map the core type together with the class name.
00:06:21.120
I can outline all of this detail regarding Marshal because ultimately, if you are running a Rails application, you're likely depending on Marshal to handle many of these processes.
00:06:29.400
So if you decide, like we did, to try and completely remove Marshal, you will likely discover your application experiences breaks in areas you didn’t even foresee.
00:06:36.060
This sentiment reminds me of Disney's "Fantasia" where Mickey, as the Sorcerer's Apprentice, animates a mop to complete his tasks. However, this contribute to chaos, leaving him overwhelmed.
00:06:44.340
This relationship encapsulates how Rails developers often delegate responsibilities to Marshal, placing blind faith in its functionality, unaware that it supports much more than anticipated.
00:06:55.680
Now that I’ve outlined the issue, let’s discuss the approach we used to eliminate Marshal.
00:07:02.520
Originally, the question was how to safeguard against cache items exploding when retrieved, as had occurred before.
00:07:09.240
To address this, we needed a format that wouldn’t encode the entire universe.
00:07:16.620
We opted for a format known as MessagePack, which is a highly efficient binary serialization format.
00:07:22.920
Similar to Marshal, MessagePack is also a binary format; however, the key distinction lies in its generic nature.
00:07:30.180
While Marshal is Ruby-centric, with unique concepts embedded within it, MessagePack is language-agnostic and can be utilized across languages.
00:07:36.780
You might not be aware of it, but MessagePack is present in your gem file because it is a dependency of the library Boot Snap.
00:07:43.560
At first sight, MessagePack functions similarly to Marshal; you simply provide a hash to the packer to obtain a byte string, mimicking the process of Marshal's dump and load.
00:07:52.680
The byte string itself resembles Marshal's output, but the significant distinction is the absence of an object type, which is vital to our solution.
00:08:01.260
Moreover, the lack of instance variable representation in MessagePack's paradigm is a significant advantage in eliminating the complications we faced.
00:08:13.020
To provide an illustration of the contrast, converting a string to a Marshal encoded byte string results in various wrapper characters surrounding the actual string.
00:08:20.760
For instance, a simple string "Foo" produces a complex byte string filled with excess data indicating type and instance information.
00:08:29.880
Conversely, when using MessagePack, you receive a much more compact representation. The bulk of the resultant data consists of the string itself, greatly improving efficiency.
00:08:38.040
In addition, MessagePack's default encoding is UTF-8, which means it doesn't require redundancy in its encoding practices.
00:08:46.500
Here, the encoding conveys both value and length in a minimized format—this is a sharp contrast to how Marshal handles string data.
00:08:54.840
To investigate the diversities within our core monolith, we executed a grep command to identify where we utilized Rails cache read methods.
00:09:05.640
Over time, we recognized that many items beyond our initial core types needed to be cased, including objects of unknown classifications.
00:09:13.920
Ultimately, we required a method that would allow us to selectively encode specific elements without encompassing the entire universe, and fortunately, MessagePack supports this.
00:09:20.940
You can define extension types with MessagePack Ruby by creating instances of Message Factory, where you can customize how types get registered.
00:09:30.360
Once registered, the serializer and deserializer methods can be tailored into functions that convert your object to a byte string and vice-versa.
00:09:39.480
In practice, we encapsulated this functionality to equip it for particular use cases. For example, handling date objects allowed us to specify the respective details used in the byte string.
00:09:46.680
Developers can leverage this straightforward approach to encode specific objects—like a date—with defined formats. Therefore, when implementing the serialization transformation, we could handle data effectively.
00:09:55.440
Moreover, MessagePack ensures that any undeclared types will cause a failure, protecting the cache integrity from unidentified issues.
00:10:03.840
When you encode items to the cache without specification, this failure alerts us accordingly, keeping us safe from the formerly chaotic setup.
00:10:11.520
We operating concurrently with both MessagePack and Marshal for approximately six months, which provided a migration path that wasn't disruptive.
00:10:19.320
Our approach was simple: we would throw the object at the MessagePack factory dump, applying extension types accordingly.
00:10:27.000
If this process succeeded, we returned a byte string; if it failed, we fell back to Marshal as a reliable alternative.
00:10:35.040
In addition, we also included version byte prefixes in the serialized items to indicate the extension types used in the encoding process.
00:10:41.520
Given that we kept a robust logging system, we examined the outcomes for instances that failed due to Marshals support, identifying precisely what types were responsible.
00:10:48.840
This logging was instrumental in understanding discrepancies in our cache regarding serialization and in refining our strategies.
00:10:56.520
The transition was complete with consistent analysis, leading us to discover and define additional types such as symbol and date time to align with the requirement.
00:11:04.320
Ultimately, we accounted for Active Record object serialization, needing to encapsulate attributes and currently loaded associations.
00:11:12.000
For Active Record, we defined an ``ActiveRecordPacker`` which passes the object through to extract the necessary attributes and relationships into a format usable by MessagePack.
00:11:24.360
The reasoning for this strategy lies into the need to avoid the types of issues previously faced concerning circular relationships that Marshal manages effectively.
00:11:34.620
Utilizing the serialization mechanism, we efficiently encoded the objects alongside their attributes and associations, even enhancing by providing ids for previously seen objects.
00:11:44.520
After encoding, we noted significant reductions in size; what once took 1600 bytes for a post alone shrank to approximately 300 bytes.
00:11:54.480
Hence, we witnessed approximately a 13x improvement over Marshal's inherent memory footprint.
00:12:01.800
Watching the results appear in production was particularly gratifying; our Rails cache's fill metrics substantially dropped post-implementation.
00:12:08.520
As our conversions scaled closer to 100%, our statistics dashboard highlighted the progress that had been made with MessagePack.
00:12:15.840
However, as we moved to exclude Marshaling entirely, we realized our initial focus on hard-coded class names constituted a risk.
00:12:22.740
This realization determined our direction forward to provide all users with ample space to define the methods that would allow their custom types to be serializable within the framework.
00:12:30.720
Taking this precaution allowed us to evade pitfalls, especially during deployments.
00:12:35.460
In summary, we developed various extension types to adequately address the diverse range of cases encountered in our codebase.
00:12:42.480
As a result of this meticulous planning and execution, we enabled our applications entirely to function without Marshall.
00:12:49.440
We have moved over to an entirely MessagePack-centric caching framework, significantly enhancing capacity for developers to manage refactoring without concerns.
00:12:57.900
Most of the topics I’ve discussed have been distilled into a gem known as Paquito, which you can access as part of the Rails ecosystem.
00:13:05.400
I extend my gratitude to Jean Bouchier, who played a significant role in this project, bolstering the extraction process and contributing vital features.
00:13:14.520
Thank you, everyone!