Unleashing the Power of Asynchronous HTTP with Ruby

00:00:00.480 Hello everyone.

00:00:07.740 It's such a pleasure to be here and have such a great time with friendly faces and potential friends. My name is Samuel Williams, and today we'll be unleashing the power of asynchronous HTTP with Ruby.

00:00:18.660 All the source code shown in this presentation is available online, so you can try it out for yourself. Also, note that the example code is for demonstration purposes only; it is not a full or complete implementation.

00:00:40.399 So, what is HTTP and why is it important? To answer this question, let's go back in time to 1990—33 years ago. The Hubble Space Telescope was launched by NASA, providing unprecedented views of the universe. Windows 3.0 was just released, which would go on to shape the future of personal computing. The first version of Adobe Photoshop revolutionized the creation of digital artwork.

00:01:03.780 And Sir Tim Berners-Lee created the first web server, web browser, and website. It was the birth of the internet. He created the first web server called CERN httpd, which was used to serve the first websites. The last version of the source code, which was released in 1996, is still available online. He also created the first website, which was hosted on a NeXT computer at CERN, providing information about the World Wide Web project itself using hypertext markup language. HTML still looks similar today to how it did 33 years ago.

00:01:42.600 Finally, he created the first web browser called World Wide Web, which also included an editor. By today's standards, it looks a bit cluttered, but it was a revolutionary new approach for generating and sharing content. Out of these technologies, what was the most important? The Hypertext Transfer Protocol, or HTTP, is the foundation on which web browsers, web servers, and websites are built. So, let's take a look at the evolution of HTTP to understand how it works and why it's important.

00:02:28.020 The very first release was named HTTP 0.9. It was developed in 1990 as a simple protocol for transferring hypertext documents. The official specification, which is about 700 words long, was released on January 1, 1991. It only supported a single GET method and had no support for request or response headers. Because of that, it could not support other types of content like images, audio, or video. As a prototype, it was primarily used for academic purposes and experimentation.

00:03:17.220 Let's take a look at the implementation of an HTTP 0.9 client. As you can see, it's short; this is the whole implementation in Ruby. We're using async for the networking. We start with an async block to create an event loop. We specify the endpoint we want our client to connect to. Then, for each argument on the command line, we connect to the server. We create a buffered stream, we write the request to the stream which includes the GET method and the path, we flush the output, and then we wait for the response body and read it all.

00:04:02.159 Now, let's look at an HTTP 0.9 server. It's a little more complex than the client. Again, we are implementing it using async. We also have our application, a small file server. We create a top-level event loop, we specify the endpoint we want to bind to, we accept an incoming connection, and we create a buffered stream for that connection. We read the request line, which includes the method and the path, then we ask the file system for the file at that path. We read the file and write it back to the client, and finally, we close the connection to indicate a completed response.

00:05:03.060 So, at a high level, the client connects, writes a request line, the server reads that request line, fetches the file, then writes the file to the client, which reads everything until the connection is closed. A limitation of this design is that only one request can be processed per connection; closing the connection is used to indicate the end of the response body. In addition, there is no way to detect errors—for example, if the file can't be found or if the network connection fails, the response body may be missing or truncated, and this cannot be detected.

00:05:29.460 The first official release, HTTP 1.0, was a significant improvement and aimed to fix some of these limitations. The main RFC is about 18,000 words long, released in 1996 with support for new HTTP methods like POST, PUT, and DELETE for handling HTML forms and file uploads. It also included a response line with a status code to indicate the nature of the response, for example, 200 OK, which is used when the request is successful, or 404 Not Found if the requested path doesn't exist. It even included 418 I'm a teapot if you accidentally send your request to the wrong kitchen appliance.

00:06:02.600 HTTP 1.0 introduced support for request and response headers, allowing for more complex semantics like content type and content length. With these new headers, supporting other formats of content became possible using MIME types, so websites could have embedded images, audio, and video. It was also the first version of HTTP to gain significant adoption and use in the industry. So, let's take a look at the changes required to make an HTTP 1.0 client.

00:06:45.960 Here is the updated client code. When we make a request, we include the protocol version. Also, different request methods could be used here. We can also include a list of headers; these are name-value pairs which indicate something about the request—in this case, we are telling the server that we'd accept a text/html response format. After submitting the request, the client must wait for a response line from the server. Most importantly, this line includes the status of the response.

00:07:03.060 Next, the client must read any headers generated as part of the response. Finally, the client reads the response body until the connection is closed. Now let's take a look at the changes required for the server. Here is the updated server code. We extract the method, path, and version of the client request. Then, we read any headers the client sends. After reading the client requests, we can generate a response. Firstly, we write our response line with a status code, then we write the response header with a content type that indicates the document is HTML.

00:07:40.740 Then we write the response body, the same as before. Alternatively, if we couldn't find the requested file, we can return a 404 status code. Finally, we close the connection. So at a high level, this is how the HTTP 0.9 client worked, and these are the changes introduced by HTTP 1.0: request headers, response status, and response headers.

00:08:25.879 However, the official specification still only allowed for one request per connection, which, as we mentioned, causes unnecessary network overhead. The next release, HTTP 1.1, tried to address these issues. The specification was released in 1999 and updated the network protocol to enable significant performance improvements. Notably, it introduced support for persistent connections, allowing multiple requests and responses over a single connection. It also added support for chunked transfer encoding, which allows for more efficient transfer of large files. HTTP 1.1 was widely used and remains in use today.

00:09:18.060 So let's take a look at the changes required to support persistent connections. Our client code is largely the same as before, with a few small changes. Instead of making one connection per path, we make a single connection and then make several requests—one for each path—using that same connection. When reading a response from the server, the server can indicate the content length of the response body using a response header. If the length is known on the client, the client can read exactly that amount and then can make another request and reuse the same connection.

00:09:54.779 The server has very few changes; it's mostly the same as before. However, when we serve the file, we read the contents of the file into a buffer and then we add the content length header to the response. After that, we write the body, and if the connection is not closed, we can read another request. So, a simplified view of the previous HTTP 1.0 implementation is extended in HTTP 1.1 such that if the content length is known, closing the connection is no longer used to indicate the end of the response body. The connection remains open and can be reused for another request.

00:10:37.800 However, one big limitation still remains: we can only have one request active at a time; they cannot overlap. So why is this a problem? On modern web pages, you have an initial document which often refers to several other resources. Because you can only send one request at a time, subsequent requests have to wait for their turn to use the network connection. To avoid this kind of waiting, HTTP 2 was introduced and was a significant departure from HTTP 1.

00:11:03.600 The HTTP 2 specification was released in 2015 with a focus on improving performance and reducing latency. The main RFC is about 25,000 words long. It introduced a new binary format that allows for more efficient parsing of requests and responses. It also supports concurrent multiplexing of requests and responses on a single TCP connection, allowing for more efficient use of network resources. In other words, you can have several requests active at the same time. It also introduced HPAC, a specification for header compression, which significantly reduces the overhead of transmitting commonly used headers.

00:12:12.240 While HTTP 2 introduced significant improvements to the protocol, it was designed with the same semantics as HTTP 1 so it could work seamlessly with existing applications and infrastructure. Now, let's take a look at how to implement an HTTP 2 client. The implementation of HTTP 2 is actually fairly complex, so we're going to use a Ruby gem called Protocol HTTP 2, which implements the binary framing and semantics of the connection handling.

00:13:30.000 HTTP 2 includes specific settings for negotiating details of the connection such as frame size and flow control. As before, the client makes a connection using TCP. We wrap this connection in a framer which can read and write the binary frames. Then, we create a client that will process those frames and manage the connection state. To start the connection, the client must send a connection preface, which includes the settings.

00:14:05.160 Then, for each path we want to request, we create a stream. HTTP 2 no longer has a request line; all those fields, including scheme, method, authority, and path, are now included in the headers. These special request headers start with a colon and are referred to as pseudo headers. We encode those headers into a binary frame using HPACK and send it on the stream, which starts the request. In this example, we add a callback for the response headers and for the response body data.

00:14:20.280 Finally, we read frames until the stream is closed—that means the response has been received completely. Now, let's take a look at the server implementation. It's totally different from the HTTP 1 implementation, but the actual request-response semantics are very similar. Like the client, the server also specifies settings like the maximum number of concurrent streams. As before, we accept incoming connections and create a framework wrapping the connection.

00:14:53.280 Then we create a server instance which manages the state of the connection. The server must read the incoming connection preface and also send its settings to the client. After that, we add an accept-a-stream callback to that stream and we add a process headers callback, which is invoked when the client stream sends the initial request headers. We extract the path pseudo header, we read the file as before, and then we begin sending the response. Note that we include the status code as a pseudo header; there is no response line like there is in HTTP 1.

00:15:40.680 Then we send the response body data. So let's compare the high-level behavior: this was the HTTP 1.1 client and server. HTTP 2 moves all of the request-response handling into concurrent multiplexed streams. Each stream represents a single request and response. Because those streams operate independently over a shared connection, we can now have multiple requests in flight at the same time.

00:16:10.500 So previously we had to send requests one at a time with HTTP 1, but in HTTP 2, we can start all those requests independently on the same connection. However, because they all share the same TCP connection, if that connection experiences disruption or packet loss, all this change will be blocked until the TCP connection recovers or reconnects. As an example, consider a cell phone moving between two different networks. The original TCP connection will be lost and a new TCP connection must be established.

00:16:43.560 For things like streaming audio and video, this can be a problem as the data stream will be interrupted. HTTP 3 is the latest specification and was designed to solve this problem. It was released in 2020 with a focus on improving performance and security over unreliable networks. HTTP 2 was based on TCP, which, as we discussed, can perform poorly when encountering packet loss or connection failure.

00:17:09.540 To address this, the QUIC transport protocol was developed, which uses UDP instead. HTTP 3 can be broken down into four core areas: HTTP semantics, streaming, encryption, and network framing. HTTP 3 actually is comprised of two specifications: QUIC, which provides robust, encrypted, multiplexed streams, and HTTP 3, mapping the abstract HTTP semantics to QUIC streams.

00:17:39.780 The main RFCs for QUIC and HTTP 3 are 57,000 words and 21,000 words long, respectively. By developing a new network protocol, priority could be given to implementation choices that reduce latency and improve the reliability of data transfer over unreliable networks. QUIC is still in the early stages of adoption but represents a significant improvement over HTTP 2, both in terms of design and performance, and it is expected to gain wider use over time.

00:18:01.800 So, I'm a little bit disappointed because I can't present the Ruby HTTP 3 client or server today. I've been working on it for several months, but despite my best efforts, it's not ready yet. I'll share my progress updates online, so please follow me if you're interested in learning more about that.

00:18:34.440 So we can compare the high-level behavior of HTTP 2 and HTTP 3 to understand the improvements and why they're important. Let's investigate what happens when we have several concurrent streams. In this example, we have four streams, each sending a request and reading a response over HTTP 2. However, if the TCP connection experiences packet loss, at least one frame will be interrupted. Unfortunately, since all streams are sharing the same TCP connection, all subsequent frames are now blocked until the TCP recovers the missing data. This is called head-of-line blocking.

00:19:16.260 Once TCP recovers, all the streams can continue. Even though the streams themselves are mostly independent of each other, because they share a single connection, if that connection is interrupted, all the streams will be interrupted as well. In contrast, when using a QUIC stream, when a QUIC stream experiences packet loss, because the streams are sending and receiving packets independently using UDP, only the subsequent frames on that specific stream are blocked. The other streams can continue to send and receive UDP packets.

00:19:48.060 In comparison to before, while a TCP connection is not capable of migrating between networks, QUIC connections use UDP packets with a more robust mechanism for identifying the client and server. So, when you move your mobile device to a different network, the connection will continue without interruption.

00:20:15.000 So maybe you're wondering why connection recovery is important. As of 2023, according to this data, sixty percent of website traffic is generated by mobile phones. Mobile phones often move between networks, so this is a very useful feature of QUIC.

00:20:57.000 So we have talked about the evolution of HTTP, but where does asynchronous Ruby fit into all of this? Maybe we can ask a different question: are existing Ruby HTTP adapters good enough? Well, it turns out very few Ruby HTTP clients or servers support HTTP/2 or later. This limits the ability of Ruby applications to take advantage of these evolutionary improvements to HTTP.

00:21:23.400 Multiplexing requests requires a concurrency model. To solve this problem, I introduced the fiber scheduler, which provides a concurrency model well suited for handling multiple concurrent streams. In addition, full bi-directional streaming is hard to implement without an explicit concurrency model. While it might not matter for more traditional request-response style systems, an increasing number of interesting real-time web services are becoming available, such as services that operate on a real-time data stream.

00:21:44.940 So, to address these issues, I created async HTTP. It supports HTTP/1 and HTTP/2 today, and we have planned support for HTTP/3 by the end of the year. It provides the client and server interfaces and hides all of the complexity of the protocol. It is a core part of the Falcon web server, so you can host direct-compatible applications in Falcon, but it's using async HTTP internally. It has full support for bi-directional streaming, including websockets, and it is built on top of async, providing fully asynchronous requests and response handling.

00:22:19.200 It uses an internal protocol with a connection pool which provides persistent connections where possible. So let's take a look at how to write an async HTTP client. This is the entire source code for the client which supports HTTP/1, HTTP/2, and in the future, HTTP/3. We need to include the async HTTP library, we specify the endpoint we want to connect to.

00:22:41.640 Then we create the client, which handles all of the internal protocol details and functions as a connection pool for persistent connections. Then we make the GET request and we can read the response body. It's very easy to use.

00:23:04.920 The server is equally simple. Here is the code which runs the server application. It is the same as before; it serves files. We specify the endpoint we want to run the server on, we create the server, and run it.

00:23:23.640 Async HTTP hides 33 years of complexity in just ten lines of code, providing all the advanced features that enable new and exciting applications.

00:23:36.840 So let's look at three best practices for working with async HTTP. My first advice is to use a shared instance that enables persistent connections across your whole application. Since most clients are still using TCP and TLS, making a connection can be slow, so reusing existing connections will improve your performance.

00:23:52.560 The simplest way to do this is to use the shared instance. This instance must be used in an asynchronous context; when it goes out of scope, any open connections will be automatically closed. When you make the request, give the full URL; it will automatically figure out the best way to connect and perform the request, and if there is an existing connection, it will be reused.

00:24:20.760 My second piece of advice is to use fan-out concurrency where possible. If you are performing several web requests, running them concurrently will reduce the total latency either by using several connections or multiplexing several requests over a single connection. This code is similar to before but it creates a separate asynchronous task for each request. For each URL, we make the request and read the response in a child task. This allows each request to run concurrently, and then after we have created all the requests, we wait for them to complete so we can have all the responses available for processing.

00:24:50.760 My final piece of advice is to embed async into existing systems. Just because you are not using an async-aware server like Falcon doesn't mean you can't embed async into your controllers or workers to gain the benefits of concurrent execution. Here is an example of a worker which could be running on a job server such as Sidekiq which isn't specifically aware of async.

00:25:24.840 The same fan-out code before, but we embed it in a top-level sync block. This creates an event loop if required and runs the given block. This works equally well for Puma, Unicorn, or even a command-line script.

00:25:50.640 So now I've given you the tools to build asynchronous HTTP in Ruby. Please show me how you're going to unleash the power of asynchronous HTTP. I look forward to seeing your results.

00:26:00.840 Thank you for coming to my presentation. If you have any questions, please feel free to contact me.