Application Performance
Site Availability is for Everybody
Summarized using AI

Site Availability is for Everybody

by Stella Cotton

The video titled 'Site Availability is for Everybody,' presented by Stella Cotton at RailsConf 2016, discusses the importance of being prepared for site outages, especially in critical moments such as high-traffic events or DDoS attacks. Through engaging storytelling and practical advice, Cotton emphasizes the necessity of load testing to enhance website resilience against unexpected traffic spikes. Key points include:

  • Awareness of Site Outages: Cotton outlines a scenario where a sudden site downtime occurs late at night, highlighting the need for everyone in the engineering team to be prepared for emergencies.
  • Importance of Load Testing: She advocates for load testing as a proactive measure that simulates traffic to identify and mitigate potential bottlenecks before they manifest in a crisis.
  • Technical Guidance on Load Testing: Cotton provides instructions on utilizing Apache Bench, a tool for load testing, explaining its setup, command usage, and how to analyze responses from tests effectively.
  • Common Pitfalls and Best Practices: She discusses common issues when conducting load tests, such as misconfigured tests, understanding server response codes, and observing latency results to ensure the test reflects real-world scenarios.
  • Impact of Queuing Theory: The presentation elaborates on Little's Law to explain how additional requests can increase response times if server resources aren't scalable. Cotton discusses the importance of resource allocation within the web application architecture.
  • Monitoring and Tools for Optimization: She suggests monitoring server performance under load and highlights the role of application performance monitoring (APM) tools to uncover hidden bottlenecks in the application stack.
  • Conclusion: In summary, Cotton emphasizes that preparation through load testing fosters an understanding of how applications behave under stress, ultimately leading to better decisions in crisis situations. She encourages developers to explore and learn about load testing and provide their teams with the knowledge required to handle site availability challenges effectively. Her final advice underscores the need for curiosity and continuous learning in web application scalability.

The presentation not only serves as a technical guide but also instills a mindset of readiness and resilience in developers facing site performance challenges.

00:00:09.740 Everybody, welcome to my keynote. My name is Stella Cotton. I may have destroyed my voice a little bit last night at Ruby karaoke, so I’ll probably be drinking more water than usual. I decided to completely rip off a gimmick that my friend Lily did in her talk on Wednesday. Basically, any time I take a sip of water, I would like for you to cheer and go crazy. So let's start practice. Yes, you know what you're doing. Okay, cool.
00:00:34.410 We're going to get a little interactive, which is going to be funny because I won't be able to see your hands. Here's the scenario: the phone rings; it’s the middle of the night, and the site is down. Every single person on your engineering team is either out of cell range or they’re at RailsConf, and it’s just you. Raise your hand if you feel like you know exactly what to do or where to start. A couple of veterans in this audience. All right, now close your eyes, please. I can’t prove that you’re doing this, but nobody else can judge you. Try again: raise your hand if you feel like you know what to do and where to start. Everybody's very honest in this audience.
00:01:18.840 My hope is that by the end of this talk, those who raised their hands will get some ideas for how to share with their team. For those who might not be as comfortable, I hope you can find ways to understand site availability better. And for the rest of you, I hope you'll find ways to get comfortable yourselves.
00:01:43.770 One of the big challenges with site availability is that it often catches us off guard. While we might practice refactoring and testing regularly, a site outage can happen for many random reasons, and this randomness is a little scary. I’m going to start off by sharing my scary story. It’s July 2015, and I’m working as a Ruby developer at a company called IndieGoGo, a crowdfunding website where people come to fund things that matter to them.
00:02:09.010 We had a lot of successful campaigns, including an Australian beekeeping campaign that raised around 12 million dollars and a campaign to fund the movie Super Troopers 2. In July 2015, news broke that Greece was the first developed country in the entire world to fail to make an IMF loan repayment. Through a strange set of events, this situation managed to take down our entire website. It was the middle of the night in California, and Europe was waking up to this news, and they also heard an incredible story about a British guy with a wild scheme to end the Greek financial crisis.
00:02:50.330 He launched a 1.6 billion euro campaign to bail out Greece, reasoning that if everybody in Europe threw in three euros, they would meet their goal and be able to bail out Greece. Traffic started building, and people began contributing small amounts of money at super high rates. Eventually, the IndieGoGo website went completely down, and it didn’t fully recover until we were able to put a static page up front to handle the load. For me, this was so unlike my day-to-day coding, deploying, or even triaging and investigating 500 errors.
00:03:26.720 Honestly, I felt unprepared and afraid. I wondered afterwards how I could have been more prepared. Load testing is a way you can programmatically simulate many users making simultaneous requests to your website. It acts as a sort of low-stress simulator for really high-stress situations. You can play around, build your confidence, and create your own site availability playbook before disasters occur. As an added benefit, you can also identify some bottlenecks in your application that could be dangerous in the future and measure any performance benefits of changes you make along the way.
00:03:55.500 The downside of load testing, when I started, was that, because I don't come from a DevOps background—I'm just a regular Ruby developer—I found a lot of high-level instructions that gave commands for kicking off load tests. However, there was a lot of technical instruction about site performance but not much regarding how to bridge those two aspects. This led to a lot of trial and error and frustrated Googling.
00:04:07.620 In this talk, I want to discuss how to get started with load testing, how you can increase the volume of your load testing to really add stress to your site, and how to use a few tools to explore the results you obtain. So how do we get started? We’ll begin by preparing our load testing tool. We'll talk about Apache Bench because it’s pre-installed on many Linux boxes, and it's a really simple tool to get started with. This is the command that starts with Apache Bench, and it contains everything you need to kick off your first load test.
00:04:54.690 To break it down a little further, you want to choose an endpoint to which you'll send the simulated traffic. A good idea for a starting point is actually a simple static page that doesn’t make any database calls; it gives you a baseline. Once you’re confident that your load testing tool is configured correctly, you want to start choosing pages that will experience the highest traffic. For IndieGoGo, those would be actual campaign pages. In your case, it might be your homepage or something else.
00:05:29.520 You can begin testing on localhost if you're just playing around, but the load test itself is a resource and can consume resources on your computer. Since it uses your computer's resources, it will take away available resources for your web server, which will significantly impact your results as the load increases. Conversely, conducting a load test against a production website can impact the user experience or even bring your website down, hence it’s best to direct your tests to a staging server or a production server that doesn’t host any external traffic unless you're specifically performing stress tests on your production system.
00:06:00.150 If you want to investigate without affecting your live site, do not point it to your website. At least one person, Lily, was thinking it. In the same Apache Bench command we saw earlier, you'll want to configure the traffic parameters for your tests. To complete the basic command, you need to specify two things: one is the number of requests you want to execute concurrently (the -c flag), and the total number of requests over the lifespan of the load test (the -n flag). For example, we can start with a concurrency of one and enough requests to allow the system to warm up, which is important.
00:06:49.860 This means you'll execute one concurrent request a thousand times. Just make sure you can run the load test for a few minutes. To define our terms a little bit, when I talk about requests, I’m not referring to a single visitor to your web page. Depending on the number of assets your page is loading or asynchronous client requests your front-end application will make to your server, a single unique visitor can generate many requests during one visit. On the other hand, browser caching of assets means a return visitor might perform fewer requests than a new visitor.
00:07:26.370 Another key point to keep in mind is that Apache Bench and other server-side benchmarking tools won’t render HTML or execute your JavaScript, so the latency times you're observing are merely indicative of your user experience; it’s just the baseline. Consequently, there will be additional delays for your users beyond what you're measuring. Let's look at an example of an Apache Bench output. Here's a snapshot of the full results; we’ll zoom in a little.
00:07:57.500 We can see that Apache Bench will show us the percentage of requests served within a specific time frame. When analyzing the initial results, you want to validate that this latency aligns with the expected latency you would see from a request in real life. Load testing can resemble a black box; if you simply start plugging in random numbers without understanding your system, you might receive really impressive results, leading you to think, 'Yes, my site is amazing!' But those results might not reflect reality.
00:08:35.530 You want to ensure you have a hypothesis for how you expect the system to perform. If you have any data regarding how your production server performs, it will give you a ballpark for your expected request time. For example, if you look at the line stating the 99th percentile latency, it indicates that 99% of the one thousand requests we made were served in less than 693 milliseconds. If you have a graph from your production response times showing the same 99th percentile and it indicates, for example, 650 milliseconds, you're probably on the right track.
00:09:02.550 But if you're showing 100 milliseconds, you should investigate an issue with your load-testing setup. A common issue leading to unusually good results in load testing, which differ from results in production, is if you’re testing an error page, especially when using a staging server for your load testing. For instance, with basic authentication, you’ll need to include it in your Apache Bench command with the -a flag; otherwise, you’re just testing how well your server returns a 400 error.
00:09:47.780 Another common issue encountered is hitting a 500 page or redirects. Apache Bench won't follow through with those redirects, so it logs them as non-200 requests. The easiest way to identify these load-testing error pages is by checking your Apache Bench output where you'll see non-200 requests. If that number is zero, even with no significant load on your site, you probably face one of these issues. You should be able to view your server logs while running a load test, which should show the server logging any issues.
00:10:31.590 I appreciate the enthusiasm, but there's a bit of a strange situation where you need to differentiate between non-200 requests and failed requests. Apache Bench will remember the content length of your very first request, and if it changes in subsequent requests, it will register these in a mostly unhelpful failed requests section. Just make sure your logs are indicating that you're rendering the correct pages, and you can ignore those.
00:10:57.680 You can also add the -L flag in later versions of Apache Bench to accept variable document lengths. Thus, if your low-concurrency load test isn’t encountering errors, you can start ramping the volume up on your load tests and observe how your application begins to respond to increased load. Let’s discuss how queuing may impact user experiences. As we amplify the load, we will begin to notice the average response time for the entire site gradually increase as well.
00:11:30.410 This phenomenon is explained by a concept called Little's Law, which is fundamental to queuing theory. Little's Law states that the average number of customers in a stable system (L) is equal to the average effective arrival rate (λ) multiplied by the average time customers spend (W) in the system. While it may seem ridiculous or abstract, it becomes intuitive through real-world examples. Imagine a store with one cashier checking out customers; the total number of customers waiting in line over a specific period is the rate they arrive multiplied by the time spent waiting.
00:12:19.360 If a new cashier starts working but is twice as slow as the previous cashier, and customers continue arriving at the same rate, the line is going to get longer, and it will take longer for people to check out. This logic can help understand why response times increase as more requests are added. The total response time is the mean number of requests in the system divided by the throughput. If your server takes 500 milliseconds to process a request and you add more requests without increasing server resources, the total response time will rise.
00:12:58.760 A typical web application operates like a giant queue that processes requests. For instance, let’s consider a web stack that includes a proxy server, application server, and database. The proxy server sits behind your firewall and communicates back and forth with your clients or users and your web server. Common examples include HAProxy or Nginx. Then there’s the application server, which manages requests needing processing and will communicate with your database. In our scenario, let’s talk about a single-threaded server like Unicorn.
00:13:43.800 Unicorn operates with a master process that has a configurable number of worker processes to handle incoming requests. So, even though there’s one server on one machine, it can manage multiple requests simultaneously, akin to having multiple cashiers at a grocery store. Other web servers, such as Puma, may utilize multiple threads instead of multiple processes, but the concept is similar. In this simple stack, we only have one database, meaning all the cashiers in your grocery store make requests to the same repository.
00:14:48.400 Increased requests lead to greater average response times, and if you continue adding requests without adding more cashiers to process them, the wait time will grow, leading to timeouts for users. The proxy server will allow a client to wait for a preconfigured duration before ultimately signaling that it can no longer assist. As you incrementally increase the load on your system, you might consider increasing your application server's queue size, but there's a risk that your requests may remain queued at the application level, even long after the proxy server has returned its timeout response.
00:15:25.450 This means your application server will continue processing those requests, but no one will be around to experience the rendered pages. Furthermore, this contradicts queuing theory recommendations, which suggest that single queues for multiple workers are more efficient when job durations are inconsistent. If you have two web workers available and one is processing a large file while the other processes a small file, any requests that come in while the large file is processing will be queued behind that long request, creating a scenario where the smaller request takes longer than necessary.
00:16:36.600 Another instinct might be to keep increasing the timeout threshold on your proxy server to reduce the error rate. However, a user who doesn't see a web page load after a minute or two will have a reaction that’s likely equal to or worse than seeing a timeout error from your proxy server. It’s important to note that it’s not just your application that gets affected by load; you’ll start noticing effects on the operating system that hosts your application.
00:17:12.740 Some of these configurations might be set on your production machines, but especially when bringing up new staging servers or local machines, you may find you need to make adjustments. A proxy server must track each incoming client request, usually by maintaining the request’s IP address and port numbers. Since every request consumes a Linux file handle, you could end up seeing errors like 'too many open files.' Ensure that your operating system isn’t arbitrarily capping the number of file handles or descriptors your proxy server can access; for example, 1024 is a common default limit.
00:18:34.840 Since your proxy server will use one handle for incoming requests and one for outgoing connections, you could quickly reach that limit. To review your system limits, you can use the `ulimit` command on Linux. A rule of thumb from the 2.6 kernel documentation recommends not allocating more than ten percent of your available memory to file handles. This guideline suggests approximately 100 file descriptors per megabyte of RAM, so you can start from there to see if your issues resolve.
00:19:39.640 You need to check two levels: the system level by editing the limits configuration for the global defaults and ensure your user limit is adjusted since this is what your proxy server will encounter. Set your soft limit and hard limit less than the maximum limits you've set for the operating system. After saving your changes, you will need to reload the changes using the appropriate command. Finally, if your proxy server has a file limit configuration, ensure you adjust that as well. This is an example specific to Nginx, but it will vary based on your proxy server.
00:20:43.430 Another issue that you might face is TCP/IP port exhaustion. There’s a finite number of ports available on your machine, and only a subset of those ports may be used by your application, known as ephemeral ports. These ports are utilized for web requests and will be released back into the system once the process is complete for reuse against subsequent requests. You can tweak settings to increase the number of available ports: one can decrease the time wait, so ports recycle back into the system quickly, preventing stray packets from leaking across requests.
00:21:49.940 In your load tests, ensure to leave a few minutes between each test, allowing ports to recycle after use. The Unicorn documentation has excellent suggestions for operating system tuning. Consider that your application is a unique entity; you need to think about how your application behaves in real-world scenarios and how that will impact performance, which isn’t fully accounted for in your controlled testing environment.
00:22:50.310 One crucial relationship to observe is how user actions, caching, and database queries interact. For instance, if you're testing an endpoint that retrieves comments from the database, consider how many rows and the complexity of the query you're issuing against that database. If there are not many comments populated in your testing environment, don’t expect to see significant load behavior. In high-load scenarios, user interactions might generate a lot of comments continuously, creating a very different context than what you see in load testing.
00:23:57.940 Alternatively, if you've decided to display a million comments in your load testing environment, you’ll still notice that the query will be cached after the first request. If your actual user behavior involves a high volume of comments coming in simultaneously, you can run additional scripts alongside your load tests to simulate true comment creation under load.
00:24:56.820 Another factor to evaluate is blocking external requests. If your workers are overwhelmed, any slow blocking HTTP requests (like those to payment processors) will exacerbate latency for all awaiting that request. You should be comfortable understanding the lifecycle of web requests in your stack and which logs to review for errors. And you should feel confident when you're conducting load tests that you're truly putting your infrastructure to the test instead of merely confronting your testing framework's limits.
00:25:43.790 Once you reach this point, you can employ additional tools to identify actual bottlenecks you’re encountering. Begin by investigating the limits of the machines hosting your web server. During load tests, tools like 'top' can be beneficial for monitoring CPU usage and memory consumption to identify CPU and memory culprits. It’s essential to remember that the percentage displayed covers single-CPU usage, so in multi-core systems, you may see values exceeding 100%.
00:26:13.370 A tool I really enjoy is 'htop,' which might not come pre-installed but provides a fantastic visual representation of CPU usage across cores. Regarding the utilization of resources that host your proxy server, web server, and potentially your database, it’s a zero-sum game. For instance, the master process for Unicorn collects a configurable number of sub-processes that handle requests. While more workers allow for concurrent processing of requests, they also consume resources on your physical host machine.
00:27:05.700 You want to be cautious not to over-provision web servers or workers, causing your system to run low on memory and rely on slower swap memory from the hard drive. Monitor the average memory consumption of your workers on your machine, and if you don’t encounter memory leaks, it should show stable memory usage enabling you to determine the number of workers your system can reasonably handle.
00:27:42.620 If you're also running other applications on the same host, they will limit the resources available to your Ruby application. For instance, if your database resides on the same machine, you may exhaust CPU resources while increasing web workers before stress-testing the database itself. In real life, when site availability is compromised, it may seem easy to increase the number of workers to handle more traffic, but requests might become blocked if they're all accessing the same database and waiting, causing significant issues.
00:29:14.780 To effectively investigate your database under load, you might try directing it to an external database configured in Rails by altering the database YAML file to point to an external address instead of localhost. Don’t forget to configure your firewall to allow external connections, but a gentle reminder: never use your production database for such scenarios, as it could cause downtime for your website.
00:30:03.640 A highly effective way to investigate performance issues is through application performance monitoring tools, particularly transaction tracing, which collects data on slowest HTTP requests. Third-party tools like New Relic and Skylight are effective, but ideally, you should implement them on your load testing server, mirroring your production environment, to accurately identify real-world issues.
00:30:57.500 You can also utilize tools like the gem 'rack-mini-profiler' in production or during load tests, but just ensure to switch environments to production mode while running staging tests for proper request profiling. Additionally, you should enable the slow query log in MySQL to keep track of any SQL statements that take longer than a specified time to execute. Depending on your settings, you can configure this threshold from one second to up to ten.
00:31:44.110 If you notice a query is running too frequently, even if it isn't costly enough to appear in the slow query log, review the current running queries with 'SHOW PROCESSLIST.' If a particular query is running suspiciously often, it could indicate a bottleneck or a performance regression you didn’t anticipate, particularly if it’s one of those queries that consistently breaks caching under load.
00:32:27.560 Apache Bench is not always the best available tool, while it is simple and likely accessible to you now. Other options exist; Siege, for example, offers the convenience of using configuration files, hitting multiple endpoints at once, and Bombard allows for incremental load ramp-up for your applications. Bee's with Machine Guns is a cool open-source tool created by the news application team at the Chicago Tribune, allowing quick spins of a lot of micro EC2 instances for load testing.
00:33:31.360 If you find running from one box doesn’t generate enough load, it could be a sign that your site’s faster. Flood IO is a paid service that helps maintain a Ruby JMeter, a Ruby domain-specific language for creating test plans for JMeter, which is a heavy-duty load testing tool. Your application might differ significantly from what I’ve covered today—perhaps being hosted on Heroku or using Puma in place of Unicorn, with a complex fault-tolerant distributed system introducing different issues.
00:34:34.600 But what’s remarkable about load testing is its potential for fostering curiosity, providing tools to illuminate the often dark and intimidating areas of your application. Thank you, everyone! I will tweet out a link to my slides shortly if you're interested in taking a look. You can find me on Twitter at @PracticeCactus. Please feel free to come up afterward if you’d like to ask me any questions.
Explore all talks recorded at RailsConf 2016
+102