Talks
Micro Talk: High Perfmance Caching with Rails
Summarized using AI

Micro Talk: High Perfmance Caching with Rails

by Matt Duncan

In the talk "Micro Talk: High Performance Caching with Rails," Matt Duncan explores the intricacies of high-performance caching in Rails applications. He highlights two interpretations of 'high performance': quick execution and rapid development. With his experience from the Rails team at Yammer, Duncan emphasizes the necessity of efficient caching methods for moderate to large-scale web applications.

Key points discussed include:

  • Types of Caching: Duncan outlines various caching strategies, starting with page caching, which is often unfeasible due to privacy concerns. Action caching is slightly better but struggles with perspective data visibility.
  • Fragment Caching Challenges: While fragment caching handles certain limitations of action caching, it can complicate page setup and maintenance.
  • Introduction to Record Caching: Duncan proposes record caching as a superior solution, which stores user data at a lower level, directly before database queries. This allows for improved cache hit rates and minimal data redundancy.
  • Example of Record Caching Implementation: He details an example with a user model, explaining how caching by ID and email enhances both efficiency and cache functionality. In particular, he describes how cache keys are structured, aiding in efficient data retrieval while accommodating user-specific contexts.
  • Drawbacks of Record Caching: There are limitations, such as the necessity for rendering post-cache retrieval and the challenges posed by SQL operations impacting cache invalidation. Migrations are also tricky since these can resemble SQL updates.
  • Cache Invalidation Process: The talk covers how to manage cache invalidation effectively, which is simplified by automatic management for most record changes, with manual handling needed for specific operations.
  • Performance Metrics and Results: Duncan concludes by sharing that record caching at Yammer achieves a remarkable 98% cache hit rate, indicating high efficiency in query retrieval and significant performance benefits.

Overall, the talk stresses the importance of efficient caching strategies in Rails applications, showcasing how adopting record caching can drastically improve performance.

The main takeaways are that careful implementation of caching can significantly enhance application speed and user experience while maintaining manageable data and operational complexity.

00:00:17.800 Hello everybody! Today, I'm going to talk about high-performance caching with Rails. In this talk, 'high performance' actually has two meanings. One refers to the speed of execution; caching is critical to making applications run fast. The other meaning pertains to the speed of development; I assume all of you use Ruby, which helps us build things quickly. So, real quick, I am Matt Duncan, and that is a giant picture of me. I work on the Rails team at Yammer. Now, let's talk about caching because that's way more fun!
00:00:40.920 Any decent-sized Rails application will need caching, and any decent-sized web application will likely need it as well. We have a lot of options to choose from, so let's dig in. We'll start with page caching. However, page caching isn't a very good option for most sites because they often contain private data, making authentication challenging. Therefore, we'll ignore page caching for now.
00:01:10.119 Action caching is slightly better since it allows us to handle authentication to some extent. However, it has its own problems. One major issue is that it doesn't adequately handle perspective data, which is data that different users can see differently. For example, an administrator might see different data compared to a regular user. Although action caching can accommodate this, it can lead to caching a separate page for each user, which is impractical due to the amount of space it occupies.
00:01:30.799 The standard solution is fragment caching, which resolves many of the perspective data issues, but still has some challenges. You can scope things to smaller sections of your pages, but it introduces complexity and can be quite cumbersome to set up. You have to place caching blocks throughout your pages, and when changes are necessary, it can become a messy ordeal.
00:02:00.639 Instead of dealing with the complications of fragment caching, today, we are going to focus on record caching. Record caching performs database query caching. There are similar libraries such as Cash Money, but the main goal is to cache at the lowest level possible inside Rails, essentially right before we execute queries to the database.
00:02:10.560 Let’s walk through this with a simple user model as an example. The first line showcases record cache by ID, which does exactly what you might expect — it caches the model by its ID, serializing it into Memcache. The cache keys generated for each user are part of this process where, for instance, a cache key for user ID one would look like 'user:1:version:2:cache:3'.
00:02:29.160 In this structure, 'two' represents the version of the model, allowing for the invalidation of the entire model when needed. If any breaking changes occur in record caching, we can invalidate all our records. While not ideal, this capability is essential. The functionality continues; once you call 'find by ID' with an ID, it will try to fetch the cached record from Memcache. If it's not available, it retrieves it from the database and stores that record back in Memcache for faster access next time.
00:02:53.920 The next line demonstrates record caching by email instead. Here, we are only caching the ID, which allows for greater efficiency since we're not storing a full record in Memcache for these additional indices. When we call 'find by email,' it first retrieves the cached ID and then fetches the actual record using that ID. Thus, we manage to achieve two Memcache hits while keeping the stored data small, focusing only on what uniquely identifies each record.
00:03:40.560 In this example specific to Yammer, users belong to networks, and it’s useful to scope by active users, which we can accomplish by calling 'active users by network ID.' Importantly, we must specify names when scoping since record caching can’t automatically infer them. Now that we've discussed how this works, we must consider potential drawbacks.
00:04:13.920 For one, rendering remains a necessary step after retrieving objects from cache since fragment caching helps alleviate the rendering overhead. Despite this, it’s not a huge issue because we can save significant time with record caching. However, when raw SQL operations involve manually updating or deleting records, you’ll need to handle cache invalidation on your own unless using 'update all' or 'delete all', which are already managed automatically by record caching.
00:04:45.199 Migrations can be tricky since they essentially work like SQL updates, meaning the same invalidation rules apply. Furthermore, active record chaining is not fully supported. You can have one scope, but beyond that, things get complicated. If anyone is interested in diving deeper into these issues, I would be happy to assist. Nevertheless, it turns out that these limitations are not significant obstacles since you're usually caching items that don’t require extensive chaining.
00:05:17.919 Returning to how invalidation works, it’s quite simple: you just call 'invalidate_record_cache' on an individual record. This process is handled automatically when you save records, so you only need to consider it when manually updating records. You can also invalidate an entire model in two ways: you can increment the cache version or utilize an alias for 'invalidate_record_cache' on the class itself, effectively increasing the cache version.
00:06:09.740 So how well does it work? This is arguably the most crucial question. The non-benchmark answer would be that it works exceptionally well; we hardly think about it, as it functions seamlessly in the background. We mainly deal with adding new records and occasionally invalidating them during migrations. From a measurable perspective, we maintain an impressive 98% cache hit rate at Yammer, meaning only about 2% of queries miss the cache, which is fantastic. Overall, the performance benefits from this approach are significant.
00:06:45.679 Thank you for your attention!
Explore all talks recorded at GoRuCo 2012