RailsConf 2021

The Trail to Scale Without Fail: Rails?

The Trail to Scale Without Fail: Rails?

by Ariel Caplan

In the presentation titled "The Trail to Scale Without Fail: Rails?" at RailsConf 2021, Ariel Caplan discusses the scalability of web applications using Ruby on Rails, specifically focusing on Cloudinary, which handles a staggering 15 billion media requests daily. The main theme revolves around whether Rails can effectively manage scaling concerns for highly demanding applications, especially when the majority of web apps do not experience serious scaling issues.

Key points discussed include:
- User Experience Focus: The importance of not getting lost in technical details but rather maintaining a focus on delivering a superb individual user experience, as scaling is fundamentally about reaching more users effectively.
- Impressive Scale Examples: Cloudinary along with comparison to other major Rails apps like Shopify and GitHub, illustrating the heights of traffic these applications manage.
- Architecture Layers: Caplan describes how Cloudinary divides its services into various layers—CDN, IO Layer, and CPU Layer—to manage requests efficiently, with the CDN handling the majority of traffic.
- Sharding Approach: The talk elaborates on how sharding databases helps distribute workloads, thereby enhancing performance and making it easier to read and write.
- Geographical Location: The presentation highlights the benefits of having data centers located closer to users in different regions, optimizing response times.
- Deduplication of Work: Caplan explains the significance of not repeating transformative processes on the same assets, which is managed through a unique locking system called Lobster.
- Managing Spikes in Traffic: Strategies such as rate limits and background processing handle sudden surges effectively, ensuring no single customer can overwhelm the system.
- Developer Experience: Caplan discusses the challenges of recruiting Ruby developers in Israel and hints at Cloudinary's plan to gradually migrate certain components to polyglot microservices to attract more talent.

In conclusion, while the challenges of scaling with Rails are acknowledged, the overall message is optimistic regarding its capabilities. The key takeaway is that Rails can be a robust foundation for building scalable applications if developers strategically implement readable and maintainable architectures that prioritize user experience alongside technical performance.

00:00:05.600 Hey everybody, my name is Ariel Caplan, and I'd like to welcome you to my talk, "The Trail to Scale Without Fail: Rails?"
00:00:14.580 Before we hop into the content of the talk, I want to take a moment to step back.
00:00:20.039 Please have a look at the image on your screen right now and think about why I chose this image, which doesn't seem to be related to scaling at all.
00:00:33.899 To me, what I see here is one user, one device, having one experience. When we talk about scale, it's easy to get lost in big numbers and fun technical details.
00:00:46.379 Ultimately, what we're really trying to do is create this kind of individual experience many, many times over. When we think about things this way, and ensure we are not just focusing on average response times, but instead really considering the ultimate user experience, we achieve much better results.
00:01:04.199 Now, to jump to the other side of the spectrum and dive into technical details, what do you think is the biggest Rails application in the world?
00:01:17.520 I began thinking about this a few years back when I attended a talk at Ruby on Rails called "The Recipe for the World's Largest Rails Monolith" by Akira Matsuda.
00:01:29.580 During that talk, he described Cookpad and the immense traffic they handle. He talked about various metrics that showcase the gargantuan scale of their Rails monolith.
00:01:41.700 The one I want to highlight is requests per second; he mentioned that they handle about 15,000 requests per second, which I found pretty impressive at the time, especially considering that at my company, the app had maybe a few hundred requests per minute.
00:02:02.280 The following year, at RailsConf, Simon Eskilden gave a talk about Shopify where he described that they handle between 20,000 to 40,000 requests per second, peaking at 80,000. I'm going to focus on that 20,000 to 40,000 range.
00:02:30.959 Just a few months ago, Shama, who is a VP of Engineering at GitHub, mentioned that the GitHub API deals with over a billion API calls daily.
00:02:43.019 I don't know if this includes git pulls and pushes or if it is really representative of their overall traffic, but I’m assuming it’s accurate within the general order of magnitude.
00:03:00.180 So we have several different Rails applications that showcase how to work at scale. If we want to compare them, we normalize them to requests per day.
00:03:17.640 Cookpad is about 1.3 billion requests per day, while Shopify is at 2.6 billion. That was in 2016, and I don't know how Cookpad has grown since then.
00:03:25.379 I know they have expanded into a few more countries. Shopify has nearly doubled the number of stores they host, so they are probably around 5 billion by now.
00:03:31.500 GitHub, just a few months ago, was at a billion requests per day.
00:03:39.180 Cloudinary, where I work, handles about 15 billion requests per day. These are not just regular requests associated with a Rails app; they are primarily media requests.
00:03:51.120 To give you a sense of our scale, it's like taking the other three examples and doubling them.
00:04:03.480 Now, consider a typical Rails app. A client makes a request to a server, which talks to a database, retrieves some information, and sends a response back to the client, either in HTML or JSON.
00:04:10.380 When we're dealing with Cloudinary, we operate a bit differently. I'm going to keep this view simplified, even though both your apps and our app are much more complicated architecturally.
00:04:31.320 Cloudinary serves images and videos. Often, we will not simply serve the images and videos we have stored; we also make various transformations.
00:04:45.000 If you've ever done image processing, you know it’s a computationally intensive process, and that’s what we handle for each of these requests.
00:05:10.380 Let’s dive deeper into what Cloudinary does, particularly if you're not familiar with our service, to understand the nature of our services, our use cases, and the resulting traffic.
00:05:28.680 We will focus on one image and how it journeys through our process. The first thing is that you can upload an image to Cloudinary.
00:05:37.920 We store it in one of our cloud storage solutions, typically Amazon S3 but sometimes Google Cloud Storage.
00:05:46.320 Next, we apply transformations and deliver the media via a CDN. Most of you can probably imagine what uploading and storing are, but the latter two items on this list might be more confusing.
00:06:02.160 Let’s clarify by going through them. Imagine you have an image uploaded by a user. Perhaps it’s not perfectly cropped and has a lot of open space.
00:06:24.960 You may want to fit it into different layouts on your site, which may require multiple crops, such as a portrait and a landscape crop. You also want to crop intelligently without simply taking a random slice.
00:06:47.760 If the image is to be used as an avatar, you would focus on the face. Additionally, you might want different sizes and shapes of the same image, such as a larger version for the author’s profile and a smaller one for comments.
00:07:02.640 In essence, you might not want to stop there, as you can enhance colors and pixelate faces to protect users' privacy.
00:07:18.000 You can also add a company watermark at 50% opacity. All of these transformations can be achieved in Cloudinary by appending little bits to the request URL.
00:07:36.420 For example, parameters specify enhancements, the cropping method, gravity to focus on the face, and the preferred compression. The transformations are all specified in the URL, and when parameters change, a different image is generated.
00:08:02.760 All requests in Cloudinary entail significant computational work. Your users will likely access your site from different devices and browsers, each supporting different image formats.
00:08:34.560 For example, WebP is supported in Chrome, but Firefox only gained support in late 2019. Different image formats must be generated and served appropriately.
00:08:57.000 Let’s discuss CDNs. The idea here is that you have your servers, maybe in Northern Virginia.
00:09:04.800 Your users might be anywhere around the world, and you want to serve media from a server that is closer to them. However, if your app servers are in one place, the CDN provides a network of computers around the world.
00:09:21.780 By caching media close to the user, the CDN can serve requests efficiently, saving time in loading content.
00:09:33.540 Now that you are a bit more familiar with how Cloudinary operates, let’s look at how we serve the 99.7% of delivery requests that involve transforming and serving media.
00:09:53.700 Most requests in our queue come from different sources like a media library GUI for cataloging and editing, various developer APIs, and bulk operations.
00:10:06.960 While I won’t focus much on them, bulk operations can be very heavy—like deleting 40,000 assets. Nevertheless, most requests center on delivering media.
00:10:24.480 Before diving into how Cloudinary solves problems at scale, let me share a bit about myself. I've been working in Rails since graduating from Flatiron School in 2015 and at Cloudinary since 2018.
00:10:38.660 I am @amcaplan everywhere that matters, particularly on Twitter and GitHub. If you want to contact me later after this conference, feel free to reach out on Twitter — my DMs are open.
00:11:01.920 I love conferences and have attended many Rails and Ruby events, enjoying the opportunity to interact and hear different perspectives.
00:11:26.520 I didn’t personally build most of the tech I’m about to discuss but had the opportunity to speak with those who did. My main interest here is to share some inspiring stories of Rails scale with you all.
00:11:56.020 Now, let’s dive into how Cloudinary scales. I won’t discuss auto-scaling in detail, but all our servers are in the cloud, allowing us to scale various groups up and down.
00:12:06.740 The topics I’m going to cover include dividing our service into layers, sharding, geographical location considerations, deduplicating work, and the human factors involved in scaling.
00:12:26.640 Starting with layers, I’m reminded of the quote from Shrek about layers. Like onions and ogres, Cloudinary has layers. Let’s go through the basics of an actual request lifecycle when hitting Cloudinary.
00:12:59.580 When a user makes a request, it first hits the CDN. If the asset is cached, it serves it back without talking to a Cloudinary server. If not, it proceeds to our servers, starting with a system called Segat, which shields our main app from excessive traffic.
00:13:34.680 The Segat service talks to the correct cloud storage bucket based on the request URL. If the asset isn't available, it goes deeper into the system.
00:14:02.480 Next, we rank our layers. The I/O layer is the main app that interfaces with the outside world, while the CPU layer handles the more computationally intensive tasks.
00:14:24.900 Once the CPU processes the image, it sends it back through the I/O layer to Segat and finally to the user's computer. Each layer helps us handle around 15 billion requests, as the CDN caches assets, only sending a fraction through the Cloudinary servers.
00:14:53.760 In 14 out of 15 cases, the CDN has the asset already cached, meaning most requests never touch a Cloudinary server. Of the remaining requests, around 150 million will go to the I/O layer and about 125 million will reach the CPU layer.
00:15:21.020 In terms of technologies, the I/O and CPU layers are both Rails applications running on different servers for optimization. The Segat service is built in Go, providing efficient handling of most of the traffic.
00:15:40.740 The CDN is essential for us, especially when serving media, and we collaborate with CDN providers like Akamai and Fastly. By using their services, we can leverage best-in-class technology for our content delivery.
00:16:01.680 Our strategy reduces the workload on our systems, enhancing both reliability and performance when dealing with high volumes of requests.
00:16:24.180 We also maintain multi-CDN support, which allows us to switch between CDN partners seamlessly in the event of downtime on one end, ensuring reliability.
00:16:43.320 However, using multiple CDN providers comes with its challenges. We need to comply with their individual requirements, such as formatting rules and cache invalidation processes.
00:17:05.160 We also need to manage billing by parsing log files to track which customer used particular resources—an aspect that requires continual attention.
00:17:38.940 Moving on, let’s consider another aspect of our architecture: the I/O layer and its relation to the CPU layer. The lightweight Go service manages a vast number of requests efficiently, dealing with many connections.
00:17:58.560 Despite our efforts to optimize, balancing workloads can be challenging yet allows us to streamline different job requests effectively.
00:18:24.840 When outlining our operations for scaling, we also aim to look at how layers within our architecture can function independently. Each layer’s capacity can adjust as needs arise.
00:18:40.680 For example, if the CDN handles an increase in requests, Cloudinary doesn’t need to scale simultaneously. Each facet of our architecture can be optimized separately, ensuring that our resource distribution is efficient.
00:19:03.140 Security is also a priority; our CPU layer is safeguarded to mitigate potential vulnerabilities, ensuring that it operates effectively without direct internet access.
00:19:24.600 Next, let's dive into sharding, an essential topic for many Rails developers. This concept is crucial for databases as it enables the splitting of data across multiple databases; however, this comes with its own set of challenges.
00:19:39.600 Sharding can complicate operations, especially when designing how to partition the data and how to work with joins across multiple databases.
00:20:10.320 Our strategy at Cloudinary involves leveraging a central shard for critical active data while distributing workloads by assigning clouds to distinct shards.
00:20:41.560 When you sign up for Cloudinary, you might get a dedicated cloud, representing its own independent workspace with dedicated resources.
00:21:00.840 Requests coming in are organized based on shards, allowing for efficient handling and simplification into our backend architecture.
00:21:32.120 For our application, we employ specific helpers to identify the appropriate shard during requests. This simplifies scaling while managing the complexities of data distribution.
00:21:53.920 Despite this streamlined approach, working with shards comes with complexities. Code bases can become cluttered with shard references, complicating maintenance.
00:22:18.120 Many Rails developers are aware of the kinds of challenges that sharding can introduce—it's essential to be conscious of how this affects data evaluations and ensure queries operate within the correct shard context.
00:22:41.520 Sharding provides flexibility, allowing us to cater to individual customer needs by placing clients with large traffic spikes on their own shards or grouping them together.
00:23:01.500 The next factor in our architecture, location, serves as the geographical distribution of our server network. This strategy ensures customers get the best performance possible.
00:23:26.660 Cloudinary maintains dedicated servers for different regions, allowing us to efficiently serve media closer to where the majority of users are located.
00:23:48.060 While we provide premium-level service to customers in the US, we also cater to clients across Europe and the Asia-Pacific.
00:24:05.880 We utilize dedicated databases across regions, which helps with performance issues. However, that introduces challenges regarding the primary database access for multi-region configurations.
00:24:30.960 In terms of updates, we implemented a near-real-time synchronization system—called NRT cache—where information is cached until updates are pulled from the primary database.
00:25:01.500 This setup leads to potential latency, but for most cases, updates synchronize quickly enough.
00:25:30.720 Despite these setups, challenges can arise. For instance, we had a spike in errors when migrating data across clouds due to simultaneous updates, which significantly increased traffic during that surge.
00:26:02.520 To manage these, we ensure thorough checks during code changes that may impact larger updates, aiming to find alternative approaches and minimize disruptions.
00:26:25.620 Ultimately, effective scaling hinges on managing individual customers' traffic without impacting service across the board. Rate limiting and throttling are key methods we employ.
00:26:56.880 Another factor for handling traffic spikes involves the fair queuing system: each job is assigned a number of slots, enabling us to fairly allocate processing power.
00:27:23.560 The system’s design helps streamline traffic during surges and efficiently utilize our computing resources, particularly for larger jobs.
00:27:44.960 In addition, implementing processes to mitigate the burden during spikes, including database access queuing and maintaining overall resource availability, enhances system resilience.
00:28:11.280 As with many systems, there are challenges regarding reliability. Our locking system, known as Lobster, prevents conflicts over transformations by managing read and write locks thoughtfully.
00:28:33.800 This lock is essential during processing to avoid any resource competition from multiple requests associated with a single asset.
00:29:00.760 Despite initiatives to ensure high performance through meticulous planning and failure protocol systems, managing failures remains a concern.
00:29:21.660 Finally, let’s touch upon human factors. By fostering an informed partnership with clients, we not only help them utilize our platform more effectively, but we can also encourage them to let us know about anticipated spikes.
00:29:49.020 Encouraging proactive communication about anticipated traffic increases permits us to scale our systems effectively in preparation for their needs.
00:30:13.560 Now, transitioning to the relevance of Rails—we acknowledge that the bulk of our requests don’t necessarily touch Rails directly. The performance-oriented tasks often reside outside Rails.
00:30:44.820 However, despite any language-based shifts in performance-sensitive areas, we still find Rails to be a highly productive environment for our development needs.
00:31:14.700 Our company initially succeeded using Ruby on Rails as the core technology for the first years, showcasing how effective it can be for startups.
00:31:47.760 At the same time, we continue to explore polyglot microservices, aiming to enhance our tech stack by integrating pieces conducive to our team's and project’s growth.
00:32:09.480 The aim is to ensure a collaborative environment where our engineers can work in languages they feel most comfortable with and find efficiency.
00:32:27.240 In summary, while adapting to new technologies presents challenges, embracing both legacy and modern tools will help us address future scalability demands.
00:32:51.480 I want to thank everyone for being part of this session, for listening, and for engaging fully. Your contributions are appreciated.
00:33:00.420 Feel free to reach out if you have any questions, and enjoy the rest of the conference!