00:00:03.290
Alright, so my talk today is on concurrency in general, focusing on web servers. Most of what I'm discussing regarding web servers can be applied to any kind of server you might be interested in writing.
00:00:11.730
Let me share a bit about myself. I've been using Ruby since 2001, with my first installation being version 1.66. I have been developing web applications in Ruby since 2002 and have worked on countless sites, applications, and servers since then.
00:00:25.920
I have worked with Engine Yard since 2008, and as Jounin mentioned, I am the former maintainer of Ruby 1.8.6. The last time I attended RubyKaigi was in 2011 when I announced the end of life for version 1.8.6.
00:00:37.770
Speaking of web servers, let me preface this by saying that a web server is essentially just a server that deals in HTTP. It accepts HTTP requests and returns HTTP responses. There's nothing particularly fancy about it. There’s a snippet of code from one of CERN's first web pages that’s worth mentioning. Although Ruby didn’t exist then, I modified the code to use Ruby instead of Perl. In combination with a little bit of netcat, it creates a working web server that allows you to access static files using your web browser.
00:01:11.100
This may sound frightening, but I think it's pretty cool. On that same page, there was a quote that made me smile, and if it isn’t an endorsement for us writing web servers in Ruby, then I don't know what is.
00:01:22.950
When discussing web servers, remember that there’s nothing really special about a web server compared to other types of servers. Typically, web servers listen on a network socket, accept HTTP requests, and return HTTP responses. Ruby provides a nice set of networking libraries that make it fairly straightforward to put together simple servers. The code on the right doesn't actually accomplish much, but it demonstrates the basics of a viable server in Ruby.
00:01:41.639
Now, as a quick tangent, there’s a one-liner in Ruby that you can run to start up a working web server. How many of you know what that is? Not many? Well, that’s it right there! This will start a working web server in Ruby from any standard installation. It’s kind of cheating because what this one-liner does is leverage the WEBrick library, which encapsulates a bunch of other tools, including a command HTTP that uses the web server distributed with Ruby, WEBrick, and sets it all up for you. It does not focus on concurrency necessarily, but I thought it was interesting because many people may not know how easily you can start a web server from the command line with Ruby. This command serves your documents out of your current directory on port 8080.
00:03:19.340
Returning to the main topic of server architecture and concurrency, all servers have a similar setup. They parse command line arguments, process configuration, listen on some socket or sockets for communications, and then enter a loop. Inside that loop, they accept connections and handle them. This loop is where all of the concurrency options that affect how a server performs come into play. In Ruby, with every server I’m going to discuss, networking is handled in one of two ways: it either uses the native libraries built into Ruby, or it utilizes a library called EventMachine, which provides an event framework for writing servers in Ruby.
00:04:30.180
Ruby has a really rich set of tools for writing servers. As a demonstration, I created a basic web server called Scrawls that features a pluggable architecture. This allows you to specify whether it will be a multithreaded server, a multiprocessing server, or an event-driven server while keeping all other configurations the same. This design provides a useful tool for comparing how different concurrency options impact server performance. While I haven’t focused on varying the HTTP parsers used in this presentation, Scrawls does include that feature. You can find Scrawls on GitHub—it’s fairly incomplete right now, but it works for the purposes of this presentation.
00:05:40.090
I’ve implemented four different IO engines with their respective concurrency methods within Scrawls. One is a simple blocking single-threaded engine, while another supports multiprocessing. The third is multithreading, and the fourth supports event-based concurrency. The first option we’ll discuss is the simplest one, which forms the foundation for everything else: the single-threaded blocking server. The code example shown demonstrates this basic unit for a server. It creates a server and enters the main loop, where it accepts a connection from the kernel before passing it off to another method for handling.
00:07:00.000
However, this server cannot respond to another connection if multiple connections attempt to come in. All those connections will queue up, and the kernel has to wait for the server to finish handling one before accepting another. For all my tests, I ran them on an eight-core VM running Ubuntu 16.04 with Ruby 2.3.1, using Apache Bench as my benchmarking tool. There are a variety of benchmarking tools available, including HTTP Perf, but I chose Apache Bench for simplicity and ease of understanding.
00:08:32.360
In this first example, I ran Scrawls with the single-threaded IO engine, which implements a single thread of execution in a single process. During this time, whatever that thread is handling cannot manage any other requests. I set it up and ran 100,000 requests against it, executing one request at a time with a concurrency level of one. The test used a plain text file that is about 1k long, yielding a performance of 3,614 requests per second. While this seems decent, there are important details to consider.
00:09:28.960
The benchmark was run locally, so network latency played little to no role in the results. We were really testing the throughput of the server itself. However, in a real-world scenario, server utilization would generally include much more than just returning static files. There would be various actions happening, such as querying databases and generating content from other sources, which takes time. When dealing with real-world networks, there's significant latency involved, which would affect server performance considerably.
00:10:55.790
Taking that single-threaded example in a real-world context would definitely change the results. To illustrate, I created a simple Rack app. Scrawls provides basic Rack support, which was sufficient for my needs. This app simply waits for one second before responding to a request. With this setup, the throughput on a single-threaded blocking server drops drastically to just one request per second, making it far less useful for actual work.
00:12:19.470
This situation highlights the problem with single-threading. The analogy here is akin to standing in a long line, waiting for one cashier to serve each customer before the next can go through. Concurrency is how we address this problem, and there are three principal approaches: multi-processing, multi-threading, and event-based concurrency.
00:13:25.810
The book definition of concurrency is about decomposing a problem into independent or partially ordered units. In simpler terms, it’s about breaking down tasks into smaller bits that can run mostly independently without concern for the order of execution. When discussing concurrency with performance in mind, it's best if these independent tasks can occur as simultaneously as possible—this is where parallelism enters the picture.
00:14:28.430
For our first option in handling performance and concurrency, let's examine multi-processing. Multiprocessing means having multiple processes manage the workload instead of relying on a single process. Essentially, there’s an entity that load-balances requests among those individual servers to ensure responses reach the correct client. This method is easy to implement and can yield good performance, provided sufficient resources are available.
00:15:55.900
However, it comes with management complexities, particularly if you need to adjust to varying loads. For instance, if you’ve tuned your deployment for a specific load level and find it insufficient, handling more processes can add overhead, especially as processes are heavyweight units. Consequently, resource management can become costly. Simple implementations of multiprocessing can use straightforward forking techniques.
00:17:57.550
Let’s take a look at how multi-processing performs. I ran the multi-process IO engine with eight processes on the VM, the same setup as before. For the first test, I used a small static file, which processed 18,000 requests per second. Once again, this doesn't account for network latency, but showcases how the multi-process approach efficiently utilizes resources, as the server sends requests to different processes, allowing all cores to work concurrently.
00:19:22.600
Things change, however, when we run slow requests, where throughput drops to essentially one per second per process. Despite processes being able to handle more requests, each is still a single-threaded blocking server. The performance profile remains similar to that of the original single-threaded server. Yet we can improve throughput in a multi-processing environment by simply creating more processes to manage the load.
00:20:54.490
If I created 32 processes, we could achieve roughly 32 requests per second, lifting our performance. This concurrency model represents typical queue management where multiple checkout aisles can significantly speed up service. A drawback to this approach is resource consumption—as you introduce more processes, you increase RAM usage.
00:22:54.569
Another factor is that pre-2.0 versions of Ruby were not friendly to forking, which led to high RAM consumption. The modern Rubies improved significantly in this regard through copy-on-write friendly garbage collection, allowing many processes to share resources effectively. The performance testing results indicated that Ruby 2.0+ versions can run multiple processes without severe RAM limitations.
00:24:27.060
Let's move on to multi-threading, which is another common concurrency strategy. Threads are smaller units of instruction that can be managed independently within the same process. The advantages of multi-threading are that threads are easier to manage, lightweight, and can be performant. However, implementations can vary widely across different Ruby versions, particularly between MRI, which has a Global Interpreter Lock (GIL), and JRuby, which does not.
00:25:55.780
The challenge with multi-threading is that it can be complicated, as discussed in previous presentations. Scrawls implements an IO engine that creates a new thread for each request. If we run the fast test on this multi-threaded implementation, we find that while it's still fast, it’s actually slower than the single-threaded approach. This slowdown is due to the overhead incurred from creating threads for each request.
00:27:48.790
When multi-threading incorporates thread pools—where a limited set of threads is reused instead of creating new ones for each request—the system could bypass much of this overhead. Still, for short requests, around 3,000 requests per second is the expected throughput in this case. The implementation demonstrates that under slow request conditions, performance shines, achieving up to 100 requests per second on a single process.
00:29:36.750
Exploring hybrid models, you can combine multi-threading with multi-processing. When we run a multi-threaded server across eight processes, it mitigates a lot of the threading-related performance penalties, achieving around 9,500 requests per second. Still, many penalty issues could arise from the Global Interpreter Lock.
00:30:48.510
Moving on to event-driven concurrency, when we think of event-driven design in Ruby, it often involves a reactor pattern. In this setup, asynchronous events—like data-ready sockets—trigger synchronous callbacks that the reactor processes. This pattern excels with I/O latencies since, during waiting periods, tasks can interleave for efficiency.
00:32:20.110
However, slow callbacks can block other operations within the reactor. The main benefits of such a system include speed and low resource consumption, as a single process can handle a significant number of concurrent connections. Nevertheless, slow operations can cause bottlenecks, complicating the debugging process.
00:33:37.560
In Scrawls, I use a simple reactor implementation to handle requests. The system can handle a large number of requests quickly, effectively using real-world latency tests to establish efficiency. I've run tests comparing various request sizes and while concurrency can lead to throughput drops, it generally maintains reasonable performance even under heavy loads.
00:34:59.740
Hybridization, combining event-driven architectures with multi-threading, allows scaling performance. For example, when handling multiple concurrent requests in threaded Scrawls, threads mitigate delays per request, showcasing how effective this hybrid method can be.
00:36:32.840
Now, I want to quickly survey popular Ruby web servers and their concurrency designs. WEBrick, built into Ruby, is thread-based but notoriously slow. Mongrel, the predecessor to current Ruby web servers, is also thread-based but uses a fast SI extension for HTTP parsing.
00:37:44.990
Other notable servers include Thin, which is event-driven and tends to perform well, and Puma, which uses a thread pool model for managing connections effectively and is optimized for concurrency.
00:39:01.420
Passenger is another widely used server, featuring a multi-processing design but can also work in a multi-threaded fashion. Unicorn employs a multi-processing design that’s ideal for applications with slow requests. Finally, there’s ServerEngine, which supports multi-processing but is designed to function effectively on Windows and JRuby.
00:40:06.000
I have about three minutes left, so if anyone has questions, feel free to ask. I also have t-shirts available for those who ask questions!
00:40:43.780
No questions? Alright, please feel free to come find me afterward if you have any inquiries or if you'd like a t-shirt!