Talks

Web Server Concurrency Architecture

http://rubykaigi.org/2016/presentations/wyhaines.html

Ruby has many different web server options, covering a gamut of possible concurrency architectures. We will look at what those concurrency options are, and at what their respective theoretical costs and benefits are.
We will then look at a reference ruby web server implementation that can have each of these different concurrency architectures plugged into it, and examine how its performance under load varies with each of those architectures.
We'll wrap it all up with a summary of the results, and a look at which Ruby web servers fall into which categories of concurrency architecture.

Kirk Haines, @wyhaines
I started using Ruby back in the Ruby 1.6 days, and I have used it in my daily professional life ever since. I'm fascinated by issues of application design, distributed architecture, and making hard things easy. I was also the last Ruby 1.8.6 maintainer. For entertainment, I read studies on exercise and nutrition physiology, and I like to run and bicycle long distances. This year will include my first 50k and 50m running races, and at least one 75 mile gravel bike race.

RubyKaigi 2016

00:00:03.290 Alright, so my talk today is on concurrency in general, focusing on web servers. Most of what I'm discussing regarding web servers can be applied to any kind of server you might be interested in writing.
00:00:11.730 Let me share a bit about myself. I've been using Ruby since 2001, with my first installation being version 1.66. I have been developing web applications in Ruby since 2002 and have worked on countless sites, applications, and servers since then.
00:00:25.920 I have worked with Engine Yard since 2008, and as Jounin mentioned, I am the former maintainer of Ruby 1.8.6. The last time I attended RubyKaigi was in 2011 when I announced the end of life for version 1.8.6.
00:00:37.770 Speaking of web servers, let me preface this by saying that a web server is essentially just a server that deals in HTTP. It accepts HTTP requests and returns HTTP responses. There's nothing particularly fancy about it. There’s a snippet of code from one of CERN's first web pages that’s worth mentioning. Although Ruby didn’t exist then, I modified the code to use Ruby instead of Perl. In combination with a little bit of netcat, it creates a working web server that allows you to access static files using your web browser.
00:01:11.100 This may sound frightening, but I think it's pretty cool. On that same page, there was a quote that made me smile, and if it isn’t an endorsement for us writing web servers in Ruby, then I don't know what is.
00:01:22.950 When discussing web servers, remember that there’s nothing really special about a web server compared to other types of servers. Typically, web servers listen on a network socket, accept HTTP requests, and return HTTP responses. Ruby provides a nice set of networking libraries that make it fairly straightforward to put together simple servers. The code on the right doesn't actually accomplish much, but it demonstrates the basics of a viable server in Ruby.
00:01:41.639 Now, as a quick tangent, there’s a one-liner in Ruby that you can run to start up a working web server. How many of you know what that is? Not many? Well, that’s it right there! This will start a working web server in Ruby from any standard installation. It’s kind of cheating because what this one-liner does is leverage the WEBrick library, which encapsulates a bunch of other tools, including a command HTTP that uses the web server distributed with Ruby, WEBrick, and sets it all up for you. It does not focus on concurrency necessarily, but I thought it was interesting because many people may not know how easily you can start a web server from the command line with Ruby. This command serves your documents out of your current directory on port 8080.
00:03:19.340 Returning to the main topic of server architecture and concurrency, all servers have a similar setup. They parse command line arguments, process configuration, listen on some socket or sockets for communications, and then enter a loop. Inside that loop, they accept connections and handle them. This loop is where all of the concurrency options that affect how a server performs come into play. In Ruby, with every server I’m going to discuss, networking is handled in one of two ways: it either uses the native libraries built into Ruby, or it utilizes a library called EventMachine, which provides an event framework for writing servers in Ruby.
00:04:30.180 Ruby has a really rich set of tools for writing servers. As a demonstration, I created a basic web server called Scrawls that features a pluggable architecture. This allows you to specify whether it will be a multithreaded server, a multiprocessing server, or an event-driven server while keeping all other configurations the same. This design provides a useful tool for comparing how different concurrency options impact server performance. While I haven’t focused on varying the HTTP parsers used in this presentation, Scrawls does include that feature. You can find Scrawls on GitHub—it’s fairly incomplete right now, but it works for the purposes of this presentation.
00:05:40.090 I’ve implemented four different IO engines with their respective concurrency methods within Scrawls. One is a simple blocking single-threaded engine, while another supports multiprocessing. The third is multithreading, and the fourth supports event-based concurrency. The first option we’ll discuss is the simplest one, which forms the foundation for everything else: the single-threaded blocking server. The code example shown demonstrates this basic unit for a server. It creates a server and enters the main loop, where it accepts a connection from the kernel before passing it off to another method for handling.
00:07:00.000 However, this server cannot respond to another connection if multiple connections attempt to come in. All those connections will queue up, and the kernel has to wait for the server to finish handling one before accepting another. For all my tests, I ran them on an eight-core VM running Ubuntu 16.04 with Ruby 2.3.1, using Apache Bench as my benchmarking tool. There are a variety of benchmarking tools available, including HTTP Perf, but I chose Apache Bench for simplicity and ease of understanding.
00:08:32.360 In this first example, I ran Scrawls with the single-threaded IO engine, which implements a single thread of execution in a single process. During this time, whatever that thread is handling cannot manage any other requests. I set it up and ran 100,000 requests against it, executing one request at a time with a concurrency level of one. The test used a plain text file that is about 1k long, yielding a performance of 3,614 requests per second. While this seems decent, there are important details to consider.
00:09:28.960 The benchmark was run locally, so network latency played little to no role in the results. We were really testing the throughput of the server itself. However, in a real-world scenario, server utilization would generally include much more than just returning static files. There would be various actions happening, such as querying databases and generating content from other sources, which takes time. When dealing with real-world networks, there's significant latency involved, which would affect server performance considerably.
00:10:55.790 Taking that single-threaded example in a real-world context would definitely change the results. To illustrate, I created a simple Rack app. Scrawls provides basic Rack support, which was sufficient for my needs. This app simply waits for one second before responding to a request. With this setup, the throughput on a single-threaded blocking server drops drastically to just one request per second, making it far less useful for actual work.
00:12:19.470 This situation highlights the problem with single-threading. The analogy here is akin to standing in a long line, waiting for one cashier to serve each customer before the next can go through. Concurrency is how we address this problem, and there are three principal approaches: multi-processing, multi-threading, and event-based concurrency.
00:13:25.810 The book definition of concurrency is about decomposing a problem into independent or partially ordered units. In simpler terms, it’s about breaking down tasks into smaller bits that can run mostly independently without concern for the order of execution. When discussing concurrency with performance in mind, it's best if these independent tasks can occur as simultaneously as possible—this is where parallelism enters the picture.
00:14:28.430 For our first option in handling performance and concurrency, let's examine multi-processing. Multiprocessing means having multiple processes manage the workload instead of relying on a single process. Essentially, there’s an entity that load-balances requests among those individual servers to ensure responses reach the correct client. This method is easy to implement and can yield good performance, provided sufficient resources are available.
00:15:55.900 However, it comes with management complexities, particularly if you need to adjust to varying loads. For instance, if you’ve tuned your deployment for a specific load level and find it insufficient, handling more processes can add overhead, especially as processes are heavyweight units. Consequently, resource management can become costly. Simple implementations of multiprocessing can use straightforward forking techniques.
00:17:57.550 Let’s take a look at how multi-processing performs. I ran the multi-process IO engine with eight processes on the VM, the same setup as before. For the first test, I used a small static file, which processed 18,000 requests per second. Once again, this doesn't account for network latency, but showcases how the multi-process approach efficiently utilizes resources, as the server sends requests to different processes, allowing all cores to work concurrently.
00:19:22.600 Things change, however, when we run slow requests, where throughput drops to essentially one per second per process. Despite processes being able to handle more requests, each is still a single-threaded blocking server. The performance profile remains similar to that of the original single-threaded server. Yet we can improve throughput in a multi-processing environment by simply creating more processes to manage the load.
00:20:54.490 If I created 32 processes, we could achieve roughly 32 requests per second, lifting our performance. This concurrency model represents typical queue management where multiple checkout aisles can significantly speed up service. A drawback to this approach is resource consumption—as you introduce more processes, you increase RAM usage.
00:22:54.569 Another factor is that pre-2.0 versions of Ruby were not friendly to forking, which led to high RAM consumption. The modern Rubies improved significantly in this regard through copy-on-write friendly garbage collection, allowing many processes to share resources effectively. The performance testing results indicated that Ruby 2.0+ versions can run multiple processes without severe RAM limitations.
00:24:27.060 Let's move on to multi-threading, which is another common concurrency strategy. Threads are smaller units of instruction that can be managed independently within the same process. The advantages of multi-threading are that threads are easier to manage, lightweight, and can be performant. However, implementations can vary widely across different Ruby versions, particularly between MRI, which has a Global Interpreter Lock (GIL), and JRuby, which does not.
00:25:55.780 The challenge with multi-threading is that it can be complicated, as discussed in previous presentations. Scrawls implements an IO engine that creates a new thread for each request. If we run the fast test on this multi-threaded implementation, we find that while it's still fast, it’s actually slower than the single-threaded approach. This slowdown is due to the overhead incurred from creating threads for each request.
00:27:48.790 When multi-threading incorporates thread pools—where a limited set of threads is reused instead of creating new ones for each request—the system could bypass much of this overhead. Still, for short requests, around 3,000 requests per second is the expected throughput in this case. The implementation demonstrates that under slow request conditions, performance shines, achieving up to 100 requests per second on a single process.
00:29:36.750 Exploring hybrid models, you can combine multi-threading with multi-processing. When we run a multi-threaded server across eight processes, it mitigates a lot of the threading-related performance penalties, achieving around 9,500 requests per second. Still, many penalty issues could arise from the Global Interpreter Lock.
00:30:48.510 Moving on to event-driven concurrency, when we think of event-driven design in Ruby, it often involves a reactor pattern. In this setup, asynchronous events—like data-ready sockets—trigger synchronous callbacks that the reactor processes. This pattern excels with I/O latencies since, during waiting periods, tasks can interleave for efficiency.
00:32:20.110 However, slow callbacks can block other operations within the reactor. The main benefits of such a system include speed and low resource consumption, as a single process can handle a significant number of concurrent connections. Nevertheless, slow operations can cause bottlenecks, complicating the debugging process.
00:33:37.560 In Scrawls, I use a simple reactor implementation to handle requests. The system can handle a large number of requests quickly, effectively using real-world latency tests to establish efficiency. I've run tests comparing various request sizes and while concurrency can lead to throughput drops, it generally maintains reasonable performance even under heavy loads.
00:34:59.740 Hybridization, combining event-driven architectures with multi-threading, allows scaling performance. For example, when handling multiple concurrent requests in threaded Scrawls, threads mitigate delays per request, showcasing how effective this hybrid method can be.
00:36:32.840 Now, I want to quickly survey popular Ruby web servers and their concurrency designs. WEBrick, built into Ruby, is thread-based but notoriously slow. Mongrel, the predecessor to current Ruby web servers, is also thread-based but uses a fast SI extension for HTTP parsing.
00:37:44.990 Other notable servers include Thin, which is event-driven and tends to perform well, and Puma, which uses a thread pool model for managing connections effectively and is optimized for concurrency.
00:39:01.420 Passenger is another widely used server, featuring a multi-processing design but can also work in a multi-threaded fashion. Unicorn employs a multi-processing design that’s ideal for applications with slow requests. Finally, there’s ServerEngine, which supports multi-processing but is designed to function effectively on Windows and JRuby.
00:40:06.000 I have about three minutes left, so if anyone has questions, feel free to ask. I also have t-shirts available for those who ask questions!
00:40:43.780 No questions? Alright, please feel free to come find me afterward if you have any inquiries or if you'd like a t-shirt!