Talks

Caching Without Marshal

Caching Without Marshal

by Chris Salzberg

In the talk "Caching Without Marshal" presented by Chris Salzberg at RailsConf 2022, the speaker discusses the pitfalls associated with using Ruby's Marshal for caching in Rails applications. While Marshal provides powerful object serialization, it comes with significant risks such as issues arising from code changes and potential remote code execution vulnerabilities due to user-supplied data. To address these challenges, Salzberg describes Shopify's decision to migrate their caching strategy to MessagePack, a more efficient and safer serialization format. The talk outlines:

  • The Problem with Marshal: Many developers rely on Marshal to cache various objects, not realizing that this can lead to incompatibility issues when the code changes, especially if the state of cached data is not aligned with current code definitions.
  • Incident Example: An incident at Shopify highlighted the dangers of caching objects without considering potential changes in code. A name error occurred because cached data referenced now-unknown structures after a refactor.
  • Features of MessagePack: MessagePack serves as a robust alternative, using stricter typing and less 'magic' compared to Marshal. It handles serialization efficiently and disallows unsupported types which helps prevent runtime issues from unaccounted changes.
  • Implementation Steps: The transition required customizing MessagePack by defining extension types to handle specific data structures that the application uses, such as ActiveRecord objects, dates, and custom class instances. The implementation includes creating serializers and managing potential circular references.
  • Results: After migrating to MessagePack, Shopify observed significant reductions in cache size compared to Marshal, achieving better space efficiency and reducing risks associated with serialization. Their final implementation allows developers to refactor freely without worrying about cache-related failures.

In conclusion, by adopting MessagePack, Shopify effectively reduced cache-related incidents and improved developer experience, while reinforcing the importance of understanding caching mechanisms in Rails applications. This migration not only enhanced performance but also provided greater safety for production systems.

00:00:12.300 Hi everybody, the title of this talk is "Caching Without Marshal." My name is Chris Salzberg, and I'm a Staff Developer at Shopify. The background for this slide shows a city called Hakurate in the north of Japan, where I live. I'm part of the Ruby and Rails infrastructure group, which has many members here at the conference. I am in a smaller team of about five developers called Core Stewardship.
00:00:22.800 Our role at Shopify is to steward the core monolith, which is likely the largest Rails application in the world. We need to keep it maintainable, ensure the code is clean, and create a happy environment for developers at Shopify.
00:00:30.180 About a year or two ago, we encountered an exception in production that triggered an incident requiring a rollback. While I wasn't directly involved in the incident itself, I participated in the follow-up. The exception was a name error regarding a constant called "betaflex service active record repository."
00:00:43.440 Whenever you ship code, you deal with two universes: the universe before you ship the code and the universe after you've shipped it. Usually, most situations allow these two states to overlap, with only a bit of change. However, in this case, the change was a refactor of a commonly used part of the code concerning beta flags.
00:00:51.660 This incident was tied to another issue, which again was tied to another related issue, and all of these were placed in the cache. Then, navigating through the new universe revealed that this item was removed from the cache. When this occurred, the old universe did not recognize the item, leading to further complications.
00:01:04.440 This situation is particularly problematic because there are really only two ways to handle such issues if you’ve faced them before. One way is to put fewer things into the cache or to limit the variety of items you cache. However, this tactic won't stop the occurrence of such problems.
00:01:15.540 The other alternative would be to change the code less frequently, but that’s not an acceptable solution for our team. Our goal is to make code better and cleaner, and we don't want to prevent developers from refactoring their work. Therefore, we decided we have to dig deeper to solve this problem.
00:01:30.180 Let’s first talk about caching.
00:01:35.400 If you visit the Rails guides, you'll find a lot of great information about caching. There are various topics like page caching, action caching, fragment caching, and Russian doll caching. You can discover different places to store your cached data, whether it’s in files, memory, Memcache, or Redis.
00:01:43.440 However, what you won’t find much information about is what you can or cannot cache—what is allowed to go into the cache. In fact, if you read about low-level caching, there’s a sentence stating that the Rails caching mechanism works great for storing any kind of information.
00:01:50.700 This statement seems somewhat too good to be true, and it ultimately is. The primary functionality for caching in Rails boils down to writing or reading from the Rails cache, which is an instance of Active Support Cache. Typically, this involves calling something like Marshal.dump or Marshal.load on an object, which could be almost anything.
00:02:07.260 While there are some caveats, that's the gist of it. We utilize a different data store, namely Memcache, but the story here remains largely unaffected as it defaults to using Marshal as well.
00:02:18.300 When you cache data, it actually wraps the data with some metadata, comprising expiration and an optional version. This entire package is placed into something called an Active Support Cache Entry. There was a change between Rails 6 and Rails 7: prior to Rails 7, this whole Active Support Cache Entry object was dumped by Marshal, making the process inefficient.
00:02:36.480 In Rails 7, this process became more space-efficient due to the work performed by Jean Lucia.
00:02:41.760 Nonetheless, many issues associated with Marshal persist. Everything I'm discussing today still applies in Rails 7 but with increased efficiency.
00:02:49.200 The problems associated with Marshal are well-known, and it's no secret that Marshal.load is not recommended as a universal serialization format. You should never unmarshal user-supplied data or other external data. If you doubt my statements, there was a known vulnerability (CVE) in 2020 related exactly to Memcache and Redis cache stores.
00:03:05.460 Having items in the cache can inherently be dangerous for these reasons. You might be curious about what exactly happens inside this sharp knife known as Marshal.
00:03:13.260 If you attempt to investigate the code for Marshal, you may be disappointed since it's not written in Ruby—it's written in C. There's little documentation explaining how Marshal actually works.
00:03:20.100 I want to discuss this primarily because before removing Marshal, you must understand its functions; otherwise, it might cause issues in the future.
00:03:28.500 Here’s an overview: we'll use a simple Rails record, specifically a post model, creating a record titled "Caching Without Marshal."
00:03:36.960 When I executed this, it produced a binary blob approximately 1600 bytes long. Within this blob, you can find constants and instance variables—it even displays "Caching Without Marshal" multiple times.
00:03:43.800 This isn't particularly efficient. However, the magic of this approach allows us to pass this blob to Marshal.load, and voilà, you receive the original object exactly as it was.
00:03:51.060 Moreover, you can reconstruct the object in different threads, processes, or even weeks, months, or years later, and it will be identical.
00:03:58.680 Marshal encodes the universe; it captures everything and disregards concepts like privacy. This setup is fantastic if our universe were static, yet we typically ship every 30 minutes or an hour.
00:04:06.060 Given this non-static universe, using Marshal poses significant risks. This concern is relevant for all Rails applications.
00:04:15.900 So what actually happens when Marshal encodes? You can find much of this in the Marshal C code within MRI Ruby. The top sections contain constants indicating what Marshal is doing.
00:04:24.600 The first thing you'll spot is a major and minor version number, which at the moment is 4.8; this setup hasn't changed for many years, so you can generally treat this as a constant.
00:04:31.560 Then there are a number of what I refer to as atomic types—this is my terminology, and it's not official. These atomic types include nil, true, false, numbers, floats, symbols, and class and module objects.
00:04:39.600 There are also composite types—again, I use this term. This group includes arrays, hashes (even default value hashes), and objects. There are also unexpected types such as strings and regexes.
00:04:47.400 Marshal has specific approaches for encoding these, and some types are indeed somewhat mysterious. Discussing these aspects is also vital.
00:04:56.760 To start with objects, how does Marshal encode them? It’s relatively straightforward. Objects have a type called "type object," represented by a small character 'o'.
00:05:05.760 If we take that byte string I mentioned earlier and convert it to hex for better readability, the first thing visible is the version I referred to earlier; every Marshal-encoded entry starts with that.
00:05:12.120 Then it follows with the byte 'o' for object, the class name of the object as a symbol (indicated by a colon to signify it’s a symbol), its length, instance variables count, and then the instance variables themselves.
00:05:20.280 You can efficiently keep track of instance variables, and every variable can also host other objects. This rapidly increases the complexity.
00:05:30.420 Another crucial concept is that Marshal encodes instance variables. While we've identified instance variables in regular objects, they also apply to specialties like strings, regexes, hashes, and arrays.
00:05:39.240 Specific functionality facilitates assigning instance variables to these types. Interestingly, Marshal can handle circular references—Active Record objects often have relationships pointing to other records.
00:05:48.720 For example, circular structures can emerge from records with associations. Thanks to its design, Marshal can encode these circular elements without resulting in an endless loop or segmentation fault.
00:05:57.540 To circumvent infinite recursion, Marshal employs a type link technique—it essentially uses a pointer to signify instances already in serialization.
00:06:06.540 Another consideration is how Marshal deals with core type subclasses.
00:06:13.080 If you subclass a core type, such as a hash, when encoding it, Marshal can recognize the class, but it employs a special type called 'u-class,' represented by capital 'C,' which allows it to map the core type together with the class name.
00:06:21.120 I can outline all of this detail regarding Marshal because ultimately, if you are running a Rails application, you're likely depending on Marshal to handle many of these processes.
00:06:29.400 So if you decide, like we did, to try and completely remove Marshal, you will likely discover your application experiences breaks in areas you didn’t even foresee.
00:06:36.060 This sentiment reminds me of Disney's "Fantasia" where Mickey, as the Sorcerer's Apprentice, animates a mop to complete his tasks. However, this contribute to chaos, leaving him overwhelmed.
00:06:44.340 This relationship encapsulates how Rails developers often delegate responsibilities to Marshal, placing blind faith in its functionality, unaware that it supports much more than anticipated.
00:06:55.680 Now that I’ve outlined the issue, let’s discuss the approach we used to eliminate Marshal.
00:07:02.520 Originally, the question was how to safeguard against cache items exploding when retrieved, as had occurred before.
00:07:09.240 To address this, we needed a format that wouldn’t encode the entire universe.
00:07:16.620 We opted for a format known as MessagePack, which is a highly efficient binary serialization format.
00:07:22.920 Similar to Marshal, MessagePack is also a binary format; however, the key distinction lies in its generic nature.
00:07:30.180 While Marshal is Ruby-centric, with unique concepts embedded within it, MessagePack is language-agnostic and can be utilized across languages.
00:07:36.780 You might not be aware of it, but MessagePack is present in your gem file because it is a dependency of the library Boot Snap.
00:07:43.560 At first sight, MessagePack functions similarly to Marshal; you simply provide a hash to the packer to obtain a byte string, mimicking the process of Marshal's dump and load.
00:07:52.680 The byte string itself resembles Marshal's output, but the significant distinction is the absence of an object type, which is vital to our solution.
00:08:01.260 Moreover, the lack of instance variable representation in MessagePack's paradigm is a significant advantage in eliminating the complications we faced.
00:08:13.020 To provide an illustration of the contrast, converting a string to a Marshal encoded byte string results in various wrapper characters surrounding the actual string.
00:08:20.760 For instance, a simple string "Foo" produces a complex byte string filled with excess data indicating type and instance information.
00:08:29.880 Conversely, when using MessagePack, you receive a much more compact representation. The bulk of the resultant data consists of the string itself, greatly improving efficiency.
00:08:38.040 In addition, MessagePack's default encoding is UTF-8, which means it doesn't require redundancy in its encoding practices.
00:08:46.500 Here, the encoding conveys both value and length in a minimized format—this is a sharp contrast to how Marshal handles string data.
00:08:54.840 To investigate the diversities within our core monolith, we executed a grep command to identify where we utilized Rails cache read methods.
00:09:05.640 Over time, we recognized that many items beyond our initial core types needed to be cased, including objects of unknown classifications.
00:09:13.920 Ultimately, we required a method that would allow us to selectively encode specific elements without encompassing the entire universe, and fortunately, MessagePack supports this.
00:09:20.940 You can define extension types with MessagePack Ruby by creating instances of Message Factory, where you can customize how types get registered.
00:09:30.360 Once registered, the serializer and deserializer methods can be tailored into functions that convert your object to a byte string and vice-versa.
00:09:39.480 In practice, we encapsulated this functionality to equip it for particular use cases. For example, handling date objects allowed us to specify the respective details used in the byte string.
00:09:46.680 Developers can leverage this straightforward approach to encode specific objects—like a date—with defined formats. Therefore, when implementing the serialization transformation, we could handle data effectively.
00:09:55.440 Moreover, MessagePack ensures that any undeclared types will cause a failure, protecting the cache integrity from unidentified issues.
00:10:03.840 When you encode items to the cache without specification, this failure alerts us accordingly, keeping us safe from the formerly chaotic setup.
00:10:11.520 We operating concurrently with both MessagePack and Marshal for approximately six months, which provided a migration path that wasn't disruptive.
00:10:19.320 Our approach was simple: we would throw the object at the MessagePack factory dump, applying extension types accordingly.
00:10:27.000 If this process succeeded, we returned a byte string; if it failed, we fell back to Marshal as a reliable alternative.
00:10:35.040 In addition, we also included version byte prefixes in the serialized items to indicate the extension types used in the encoding process.
00:10:41.520 Given that we kept a robust logging system, we examined the outcomes for instances that failed due to Marshals support, identifying precisely what types were responsible.
00:10:48.840 This logging was instrumental in understanding discrepancies in our cache regarding serialization and in refining our strategies.
00:10:56.520 The transition was complete with consistent analysis, leading us to discover and define additional types such as symbol and date time to align with the requirement.
00:11:04.320 Ultimately, we accounted for Active Record object serialization, needing to encapsulate attributes and currently loaded associations.
00:11:12.000 For Active Record, we defined an ``ActiveRecordPacker`` which passes the object through to extract the necessary attributes and relationships into a format usable by MessagePack.
00:11:24.360 The reasoning for this strategy lies into the need to avoid the types of issues previously faced concerning circular relationships that Marshal manages effectively.
00:11:34.620 Utilizing the serialization mechanism, we efficiently encoded the objects alongside their attributes and associations, even enhancing by providing ids for previously seen objects.
00:11:44.520 After encoding, we noted significant reductions in size; what once took 1600 bytes for a post alone shrank to approximately 300 bytes.
00:11:54.480 Hence, we witnessed approximately a 13x improvement over Marshal's inherent memory footprint.
00:12:01.800 Watching the results appear in production was particularly gratifying; our Rails cache's fill metrics substantially dropped post-implementation.
00:12:08.520 As our conversions scaled closer to 100%, our statistics dashboard highlighted the progress that had been made with MessagePack.
00:12:15.840 However, as we moved to exclude Marshaling entirely, we realized our initial focus on hard-coded class names constituted a risk.
00:12:22.740 This realization determined our direction forward to provide all users with ample space to define the methods that would allow their custom types to be serializable within the framework.
00:12:30.720 Taking this precaution allowed us to evade pitfalls, especially during deployments.
00:12:35.460 In summary, we developed various extension types to adequately address the diverse range of cases encountered in our codebase.
00:12:42.480 As a result of this meticulous planning and execution, we enabled our applications entirely to function without Marshall.
00:12:49.440 We have moved over to an entirely MessagePack-centric caching framework, significantly enhancing capacity for developers to manage refactoring without concerns.
00:12:57.900 Most of the topics I’ve discussed have been distilled into a gem known as Paquito, which you can access as part of the Rails ecosystem.
00:13:05.400 I extend my gratitude to Jean Bouchier, who played a significant role in this project, bolstering the extraction process and contributing vital features.
00:13:14.520 Thank you, everyone!