Determining Ruby Process Counts: Theory and Practice

00:00:00.709 Okay, it's 2:20, and our talk slots for this conference are very short, so I'm going to get started.

00:00:06.810 This talk is titled 'Determining Ruby Process Counts.' This talk is really geared at anyone who has control over their dyno slider at Heroku.

00:00:13.650 Have you ever wondered what to set that dyno slider to? Or maybe if you use AWS, you want to know how many EC2 instances you should be using. Today, we're going to talk about horizontal scaling and provisioning of Ruby applications.

00:00:30.570 I'm going to use a web application as an example, but this model is very generalizable to many different types of applications.

00:00:37.140 At the end, I'll discuss how to apply this method to other application types. Basically, any Ruby or other language application that takes requests and has to serve responses can use the process that I'm about to describe.

00:00:49.289 My name is Nate Berkopec. I am a consultant specializing in Rails and Ruby performance. I have a book called 'The Complete Guide to Rails Performance.' You might have seen my blog at speedshop.co, and I also teach workshops focused on Rails performance.

00:01:07.350 I live in New Mexico in the United States. I'm here for search and rescue there, and I spend a lot of time outdoors. I spent the whole last winter skiing.

00:01:20.250 In the future, I'm going to be living outside of the United States pretty soon, moving out in actually a couple of months. If you live outside the U.S. and you think I should live in your country for a while, I'd really appreciate talking to you for recommendations. England sounds great!

00:01:50.909 This talk fundamentally answers the question of how many servers you need. In my experience as a Rails consultant, I find that most people sort of guess at this number.

00:02:03.450 At some point in the past, someone just decided to increase the number of dynos or EC2 instances until the problem seemed to go away. Oftentimes, I come into shops that are heavily over-scaled, using far too many dynos or servers for their actual load, which costs them a lot of money.

00:02:28.870 So this talk is really about how we can help you spend less money or demonstrate where you might need to spend more. When I talk about Ruby performance or performance in general, I say there are two things we can do: we can improve the experience for the customer or reduce the cost to deploy the application.

00:02:53.260 Today, this talk focuses a little more on cost control. We will mostly talk about how to use less hardware to deploy your application. This question can result in significant savings—thousands of dollars per month—regardless of application size.

00:03:13.690 One benchmark I'd like to highlight, especially on Heroku, is that if you're spending more per month to host a web application with Ruby than you have requests per minute, then you're probably overpaying and using too much hardware.

00:03:44.739 For instance, if you have an application that processes 3,000 requests per minute and you’re spending $6,000 a month, you could likely reduce your costs to around $3,000 monthly. That's a rule of thumb to gauge whether you're over-scaled.

00:04:08.139 Now, the outline of this talk: First, we’ll discuss some math and queueing theory. It's very simple, not intimidating. Then, we’ll talk about what that math means for your Rails or Ruby application.

00:04:26.800 After that, we'll apply the math in the real world. I have some numbers pulled from previous talks by other engineers at big companies, and we’ll use the math I’m going to teach you to check the claims made in those talks.

00:04:44.169 Finally, I will discuss some practical knowledge about applying this math in the real world, including some factors that can prevent the math and theory from holding true.

00:05:03.520 Let's say we have a Ruby web application that takes 120 requests per minute, has a 250-millisecond response time, and a 95th percentile response time of one second. With just these three numbers, by the end of this talk, you'll be able to determine how many Ruby processes this application needs to adequately serve its load.

00:05:32.319 To clarify, you really only need two of these numbers to calculate, but we’ll talk more about that later. The key here is what I teach, which is called Little's Law, developed by John Little at MIT.

00:05:57.819 John Little discovered that in almost any system, the number of items in the system is equal to the arrival rate multiplied by the time each item spends in the system. This principle holds true for our web applications.

00:06:18.240 For example, think about a checkout counter at a supermarket. In many places, there is a single line but multiple checkout counters, where a staff member directs customers to an available counter.

00:06:43.759 Now, if we have ten customers entering the line every minute, that’s an arrival rate of ten customers per minute, and it takes one minute to check out each customer. If there are 20 checkout counters, Little's Law tells us that the average number of customers at checkout counters at any given time would be 10.

00:07:06.860 If the checkout process takes longer, for instance, three minutes, the average number of customers in the system would be 30, which exceeds the available checkout counters, causing an endlessly growing line.

00:07:41.546 Translating this to a web application, suppose we have 240 requests per minute and a 250-millisecond response time. We need to convert these figures into the same units to apply Little's Law effectively.

00:08:06.529 Now, if we convert requests per second, we find that 0.25 seconds average response time means that one request is being processed at any given time across the application.

00:08:29.750 Knowing how many requests are being processed allows us to estimate how many Ruby processes are needed to serve the load sufficiently. This isn't a fixed number; it's a starting point based on the average request processing information.

00:09:04.990 Little's Law gives us a practical foundation but it's important to understand that we may need to adjust our estimates for real-world applications. For example, response times can vary widely, and it's critical to build in some maneuvering room.

00:09:48.110 There are three main components for each request: the load balancer, the application server, and any background processing that's done in the server environment. This interaction can affect the way requests are queued.

00:10:16.450 For instance, if a load balancer routes requests to a server with no free workers, those requests have to wait in a queue until a new worker becomes available. Our goal is to minimize the time requests spend waiting, which happens when we scale up Ruby processes correctly.

00:10:39.000 This involves understanding that the time spent waiting for a free worker is what we can actively manage through scaling, rather than directly decreasing response times by simply adding more hardware.

00:10:59.170 Scaling should primarily focus on reducing that waiting time. Auto-scaling based on response times alone can create issues because traffic spikes in certain application areas can make response times appear slower, leading to unnecessary scaling.

00:11:31.469 In an ideal world, auto-scaling should be based on how long requests are stuck waiting for free worker processes. That means monitoring request queue time—this directly correlates to the efficiency of Ruby processes.

00:11:55.240 Real-world limitations arise when multiple services run on the same hardware or manage different types of workloads, like background jobs and web service requests. Keeping these workloads separate can prevent the issues associated with queueing.

00:12:15.720 To summarize, Little’s Law, which gives us the formula: work = arrival rate x latency, allows us to identify the amount of workload in progress in a Ruby application. High request queue times indicate that the system is under scaled.

00:12:54.740 Conversely, average response times can be unpredictable, necessitating extra headroom or a buffer for scaling applications effectively. One practical rule of thumb is to aim for an operating capacity of about 25% for good request response performance.

00:13:24.900 Factoring these adjustments allows smoother scaling transitions and prevents the uncertainty of requests piling up in a queue, helping to maintain the balance between under-scaled and over-scaled services.

00:14:04.500 To provide a real-world example from Twitter: back in 2008, they reported processing 600 requests per second with 180 application processes, indicating a 100% theoretical capacity.

00:14:38.910 Shopify, during a 2013 presentation, highlighted 833 requests per second with 72 milliseconds average response time across 53 servers running 1172 application processes, thus demonstrating how different infrastructures accommodate distinct load capacities.

00:15:02.310 Think about Envato's 2013 metrics where processing 115 requests per second with 147 milliseconds response time on 45 processes reflects manageable overhead resulting in a balanced operational state.

00:15:27.830 Overall, as a general guideline, maintaining about 25% of your theoretical process capacity tends to yield the best results, and determining exact needs through Little's Law calculations provides concrete metrics.

00:15:56.960 Incorporating factors like core availability can further optimize setups. For instance, ideally, we want two processes per available core on a server to better handle situations where requests come in faster than they can be processed.

00:16:39.800 This discussion leads us to thread-related considerations. Ruby's limitations with threading can create unpredictability in environments utilizing it. Threads in JRuby however, operate under true parallelism, increasing overall throughput.

00:17:03.550 As we look towards the future of Ruby and its evolutions, guilds and autoscaling initiatives may offer even more streamlined provisioning options without increasing complexity in management.

00:17:47.960 To wrap it up, Little's Law ties together arrival rates and response times to provide a basis for determining processing needs, allowing practical insights into reducing costs while maintaining efficient service across applications.

00:18:29.390 Thank you very much for attending this discussion. More information is available in my book, and I’d love to answer any questions you have in the last few minutes.

00:19:12.000 Audience member: How does Little's Law compare against other more telephony-oriented formulas such as Erlang?

00:19:56.200 Nate: I'm not familiar with those formulas, so I can't provide a direct comparison.

00:20:43.570 Audience member: How do you account for the availability trade-off when scaling?

00:21:36.170 Nate: The availability factor shows up in many real-world scenarios—often clients hesitate to scale down to one server due to fears of downtime. My experience shows that as long as the architecture is sound, these concerns can usually be alleviated through adjusting resources and measuring their impact.

00:22:39.920 Audience member: So would you suggest maintaining that 75% process availability?

00:23:16.400 Nate: Yes, that's a good approach but always adjust it based on your actual measurements in production.

00:23:52.780 Audience member: Could you elaborate on the statistical foundations behind your factors?

00:24:28.440 Nate: The rough factor of four is based on empirical results, where I've seen clients operating around that range without impacting response times in a negative way. It's always good practice to monitor and adjust accordingly.

00:25:12.350 Nate: Thank you everyone! Now we can conclude, and if you'd like, we can head into a break.