Talks

The Little Server That Could

The Little Server That Could

by Stella Cotton

The video titled 'The Little Server That Could' presented by Stella Cotton at RubyConf 2016 dives into the workings behind web servers, specifically focusing on building a simple web server in Ruby. The talk begins with a light-hearted instruction to the audience about clapping while the speaker takes a sip of water, engaging them right from the start. Cotton clears the misconception that the tiny server they're going to create can run production websites effectively, citing it as slow and insecure, primarily for Unix-like systems.

The key points addressed in the talk include:
- Understanding Abstractions: Cotton emphasizes the importance of recognizing when to rely on abstractions in programming, explaining how they can streamline development and debugging processes.
- The nature of a web server: She defines what differentiates a web server from typical application code, highlighting its role in communication and adherence to specific APIs under the W3C standards.
- Core Components of a Server: The presentation breaks down the fundamental aspects of how web servers operate, including the use of sockets for communication, and the significance of the Internet socket versus Unix sockets.
- Communication Protocols: Explaining the TCP and UDP protocols, Cotton illustrates the mechanics of data transfer in a web server context, depicting the complexities and requirements of request handling.
- Utilization of Rack: The talk also covers how to integrate a Rack interface with a small web server, facilitating compatibility with frameworks like Sinatra and Rails.
- Concurrency: Addressing potential performance bottlenecks from blocking requests, she introduces forking processes for handling requests concurrently rather than sequentially, minimizing response times for clients.
- Garbage Collection and Memory Management: Cotton discusses how Ruby's garbage collector manages memory in relation to parallel processes, and how forking can generally induce memory overhead if not handled cautiously.
- Threading Challenges: The speaker highlights the intricacies of Ruby's threading model, especially regarding the Global Interpreter Lock (GIL), suggesting that while multithreading can reduce memory consumption, it requires careful coding practices to avoid race conditions.
- Handling Server Signals: The talk concludes by explaining signal trapping basics, such as intercepting control signals to manage server termination gracefully.

Overall, Cotton stresses the foundational knowledge required to understand production servers and the ability to navigate their complexities. The session not only untangles common misconceptions about web servers but also equips developers with the necessary insights about the Ruby ecosystem and server management.

In addition, she invites further engagement and connectivity with her audience post-presentation, offering additional resources via slides on Twitter and inviting conversations at the Heroku booth.

00:00:14.880 I'm going to give you a quick set of instructions first. A friend of mine, Lily Chilean, came up with this idea and it's quite genius. So, anytime during this talk that I'm going to walk over to this table, which is quite far away, and grab my glass of water to take a sip, I want everybody in here to start clapping and cheering. Otherwise, I'm going to do this weird dance where I'm like, 'Please don't watch me drink water.' It's going to be great! So, let's practice that now. Thank you!
00:00:48.930 I'm Stella Cotton, and I'm an engineer on the tools team at Heroku. Before we get started, I wanted to give you a heads up that I'm going to tweet out a link to the slides right after my talk. If you want to take a closer look at any of the code or some of the links I’m going to include at the bottom as references, you’ll be able to do that. I lured you all here under the false pretense that we are going to learn how to write a small web server in Ruby, the little server that could. But the reality is that this is the little server that can’t.
00:01:16.869 If you're going to use this little server that we're going to talk about here today to run a production website, you're going to find that it's slow, not secure in some cases, and also a little bit limited. It's limited to Unix-like systems, so it won't run on a Windows machine or server. But there are some reasons to care; it can do some cool things. To talk about what this dinky little server can do, we'll first discuss abstractions. As engineers, a super-powerful skill is knowing when to dig into an abstraction and when to just accept the constraints of the abstraction.
00:01:58.750 If every time you wanted to use a third-party API, you felt the need to understand exactly how it works underneath, you would be wasting your time. You'd be sacrificing the value of an API, which is to abstract away something you don't need to know about. Abstractions can make your code much stronger when you practice and implement them in your code base, helping you manage your time efficiently. You might have worked alongside someone who follows every rabbit hole when debugging a problem, wasting hours because they can't accept that some parts just work.
00:02:32.380 I find that I have to fight this instinct to write off abstractions as magic. Good abstractions should feel like magic, but we need to remind ourselves they're just tools. Servers, specifically, are tools we use every single day to help web developers do their jobs. You run 'rails s,' and you don’t really need to care about what’s happening underneath to start building a web application. The server just starts up; it's powerful, easy to use, and feels like magic. But servers are not magic.
00:03:02.410 If you dig past the abstraction layer, you’ll find just code. Tonight, you can go home, visit GitHub, and check out Ruby server repos like Puma or Unicorn; you’ll find familiar concepts inside that mystery. Our server will help us understand what's happening in production servers. What else can the little server do? The pieces that make up web servers are fundamental and will help you build a foundation to understand all sorts of fascinating things happening within the Ruby community.
00:03:48.340 For example, why was garbage collection so important in Ruby 2? Why do we care about the concurrency model with 'green threads' that he talked about earlier? These fundamentals will also help you explore outside the Ruby community and venture into operations or systems communities, like watching Kelsey Hightower live-code a talk on containers, or reading Julia Evans's posts on race conditions. Now, let’s start off by talking about what a server really is. Today, we are going to discuss specifically a web server.
00:04:58.820 A web server lives on a physical computer, which is confusingly also called a server. Right now, you're probably using one of these servers: Unicorn, Puma, or Webrick—pretty common. In a lot of ways, a server is just like any other program you run on your computer. It has code, it lives in a file, and you run it from the command line. So how is a server different from your web development code that lives inside Sinatra or Rails?
00:05:27.980 Firstly, it communicates with the outside world and leverages the operating system's power to do that. Secondly, it conforms to a very specific API to communicate. This might not sound like a big deal, but consider the web today has over 4.68 billion indexed web pages, served up by servers globally. These web pages are viewed by typically five different browsers across various devices. The fact that everyone communicates in the same language is incredible.
00:06:04.880 In my experience, if you ask five developers to solve a problem individually, you're likely to get six different answers. The magic lies in a standards body known as the W3C, the World Wide Web Consortium, formed in 1994, which creates standards for the open web. They established a document called RFC 2616, which is 175 pages long, outlining the entire API that web developers use every day.
00:06:43.190 You might look at this wall of plain text and get overwhelmed. It's important to remember that it's not API documentation as you would typically think of it. Regular API documentation may show you how to use Twitter's API, whereas web request documentation explains the structure of requests. We can form these web requests through a few methods. Here, I will use the term 'client' as an umbrella term to describe these methods—be it a browser, Telnet, or Curl. The documentation here is pretty good and concise.
00:07:49.960 However, why is this RFC more complicated? Think of it as Twitter giving you all the specifications needed to implement your own Twitter API. It outlines every possible way someone might call that API and how Twitter might respond. It’s essentially a giant contract for structuring web requests and responses, enabling global communication. If you think of it in a Ruby way, it's like a giant list of tests you need to write to create a web server.
00:08:14.960 It's impossible to talk about writing a web server without mentioning this RFC, as it’s incredibly important. When I was initially trying to understand how web servers work, I found information leading back to this RFC. I often felt frustrated at the thought of having to read all 175 pages and implement it myself. I felt like I wasn’t a real developer because I believed I had to do that to understand web servers. The reality is that while it's important and beneficial to look at, it won't directly help you understand how production web servers actually function.
00:09:03.810 It's more of an instruction manual for respecting the conventions of the open web. If you want to build a production web server, you'll need to respect that contract to ensure you can respond to requests properly. Today, we will focus on digging deeper into some fundamentals. We talked about what a server is generally, so let's study three fundamental building blocks of a simple web server and how it facilitates communication because that’s what servers are all about.
00:09:54.590 First, let's discuss how a server communicates with the outside world. At its core, it's just a program running on a machine that communicates in defined ways. When you start this program on your machine, UNIX creates a little environment for it to run in, called a process. This allows you to assign variables and change states inside your program without corrupting the global state in the operating system. Unlike any other process, you can use 'ps' to see what's running.
00:10:41.410 This process is special because it uses the operating system's power to communicate with the outside world. The Ruby standard library provides a small web server you can run called WEBrick, which also has wrappers around common Unix system calls, aiding you in building your own web server. So, how do we do this? First, we discuss opening a socket. A socket allows processes on a system to communicate with each other, either through local or remote connections.
00:11:24.320 Everything in UNIX is treated as a file, and sockets are no different. They are simply a specific type of file that server and client processes can read from and write to. To see the sockets currently running on your machine, on a Mac, you can use 'netstat'. The operating system identifies these files with a number called a file descriptor or file handle. Ruby gives us a higher-level abstraction, so we don't need to manually deal with these numbers in our system calls, but underneath, that’s how the operating system references the specific files we are interacting with.
00:12:30.880 We will not just open any socket; we need to inform our operating system that we're opening a specific type of socket capable of accepting web traffic. To create a web socket, we first choose our addressing format or communications domain. In simpler terms, this determines how you'll talk to your socket. The two most common formats are Unix and Internet sockets. A Unix socket allows communication between processes on the same machine, while an Internet socket lets the outside world talk to the process running on your machine.
00:13:04.720 In our case, we want to use an Internet socket for external communication. After deciding on our address format, we define the specific type of socket we wish to create. Generally, you'll hear about two types of sockets: stream sockets and datagram sockets. Stream sockets function like two-way telephones, facilitating bi-directional communication using the TCP protocol. The TCP three-way handshake ensures both client and server are connected before exchanging critical information.
00:13:37.230 The client may introduce themselves, and the server will acknowledge. Stream sockets ensure that all transmitted data arrives in order. If the order is incorrect, your web page might render improperly or not at all. The server continues to send data with a sequence number, while the client is responsible for keeping the data organized, even if packets arrive out of order. Therefore, the TCP protocol with stream sockets is terrific for delivering HTML pages in the correct sequence, but there's a trade-off in connection time.
00:14:05.190 Datagram sockets, on the other hand, function unidirectionally, like a megaphone. There's no handshake because they don't care about establishing a mutual connection. They are loud, allowing for quick transmission of information but do not guarantee that the information will arrive in the same order it was sent. Datagram sockets utilize the UDP protocol rather than TCP. Common real-world examples utilizing this method include multiplayer games or streaming audio.
00:14:49.790 For our web server, we will set up a socket that uses the Internet communication domain, meaning it will communicate via TCP over a stream socket. Once we have our socket, we can create the address to bind it—a unique identifier that allows external clients to connect to our server.
00:15:03.550 We will bind the server to this address so it knows where to listen for incoming information. Finally, we will instruct our socket to listen for incoming connection attempts. After that, we wait and listen for requests. Next, we will create a new method that loops, continuously listening for requests. If someone dials our number, meaning they write to our socket with a command similar to this curl command, it will create a new pair of sockets so we can communicate back and forth with the client.
00:15:58.150 Using the previous socket is not an option because that one has the single task of accepting incoming connections. Therefore, we need a different socket for communicating back and forth with the client to avoid any jumbled data. Once we receive the requests from the socket, we are good to go! What are we actually doing when we handle those requests? For now, let's say this is where we will run some application code. It runs and returns an HTTP response that looks similar to the simplest response outlined in the RFC we mentioned earlier.
00:17:05.980 The response will include a header and a body—it's essential to write that response back to our socket so the client can read it. At the end, when we decide to shut down our server, we will close our socket using the 'socket.close' command. This entire process is how our tiny little server communicates with the outside world, repeating the operation for each request sent by a client. It will continue returning this familiar phrase, 'Hello, world!'
00:18:03.550 Now that we've discussed how the server interacts with the outside world, let’s examine how it communicates with the application code living underneath. We'll start by discussing what a parser does. Earlier, we built this tiny web server and saw that we received a response but didn’t actually process it.
00:18:21.460 A parser’s job is to take in that request (the one we ignored) and use RFC 2616’s guidance to break that request into manageable pieces for your server to act upon. A parser extracts components such as the header, body, and URL. Crucially, this need for speed and accuracy is paramount. Although you can find production web servers written in Ruby, the actual parser is typically written in C for performance purposes.
00:19:09.640 One noteworthy parser was created by Zed Shaw for the Mongrel web server, which has been ported into many Ruby web servers in use today. It utilizes a Domain Specific Language (DSL) called Raggle to specify what a valid HTTP request looks like, employing state machines for safe and accurate parsing. While beyond the scope of this talk, knowing about it is beneficial if you begin exploring Ruby web server code.
00:19:40.950 As we build our little server that can't, we will skip the parser construction for now and instead operate under the assumption that, regardless of the requests we receive, everybody gets the same response. Having addressed parsing let’s now consider what happens when the server communicates with our application rather than a hardcoded 'Hello, world.' How can we modify this server to plug in any standard Rails or Sinatra web app?
00:20:38.690 We can accomplish this by leveraging the power of Rack. There's a common interface in Ruby that all servers and applications can communicate through called Rack. Both the Sinatra and Rails frameworks utilize this interface, allowing for easy substitution of web servers without a lot of configuration changes. The basic implementation on the application side will be a Ruby object that responds to a method call, takes one argument, and returns a response.
00:21:23.590 Here's an example of a super lightweight Rack app: a Ruby object that responds to a method call, taking one argument and returning status, header, and body. Instead of constructing a string, we will execute 'app.call' to return the desired result without worrying about the underlying processes. Our super lightweight Rack app ensures that requests are processed accordingly, with the server running and communicating efficiently.
00:22:10.590 Imagine now that within our Rack application we have changed the function to call an external API instead of merely returning 'Hello, world.' Suppose that API happens to fetch a new cat gif every time someone visits our homepage; we can illustrate this scenario by adding a sleep function to simulate a slow blocking request to an external server, which may take about five seconds to return a response.
00:23:06.640 What happens here is that if one user waits five seconds for their cat gif, but at nearly the same time, another user visits the page, they might end up waiting an extra five seconds. Even short external API calls can culminate in long wait times for users as traffic to your site increases. You can visualize this scenario using a grocery store analogy, where the store has only one cashier available for checkout.
00:23:43.190 Imagine you are in a grocery store with multiple shoppers lined up at a single cashier. Each request has to wait until the one in front of it is completed. In our analogy, the cat gif being requested is akin to a new cashier who moves significantly slower than the seasoned ones. As more customers come in during this busy time, they will face longer waiting times. So, how can we speed things up?
00:24:31.150 We can add new cashiers, much like forking a process on our server to create a new subprocess to fetch that cat gif. Previously, we were receiving client requests and handling them sequentially. Now, we can wrap that method in a 'fork' command, creating a new process to handle the request for the external cat gif, which allows the parent process to continue accepting requests without delay. This way, our clients won't have to face the staggered five-second waits.
00:25:40.960 It's crucial to remember that even after forking, you must close the parent connection to that socket. When you invoke 'fork', the child duplicates the parent process's connection to the socket, including all references to its file descriptor. The operating system keeps track of how many references exist to that file descriptor, and if the parent socket stays open while the child closes theirs, you may run into an error eventually, as the total count of open files is restricted.
00:26:28.720 If you forget to close that parent socket, it stays open while the program is running, leading to an accumulation of processes that might eventually present an error indicating too many open files. Just as a side note, it’s also possible for orphaned processes to keep lingering in the background, causing conflicts upon server restarts, due to previous connections that are still being held onto.
00:27:06.210 Every child process is a replica of the parent process, but they operate with their own separate memory. If any of these processes wish to communicate, they will need to utilize a Unix socket. When we fork a process, we create a copy of the entire application. However, the memory duplication is optimized through a mechanism called copy-on-write, where only memory segments that are modified will be duplicated.
00:27:37.540 Until Ruby 2, however, memory was not managed efficiently during this forking process, leading to Ruby being perceived as a memory hog due to the way garbage collection worked. Garbage collection allows us, as Ruby developers, to allocate memory freely without worrying about reclaiming it. For example, if we allocate a variable for the cat gif, we wouldn’t need to release it back manually with garbage collection in charge of obtaining unused memory.
00:28:27.050 This helped Ruby take advantage of shareable memory and reduced memory footprint in forked processes, relieving previous concerns. However, if you have multiple clients trying to connect simultaneously and consistently choose to fork a new process for each incoming request, your server will ultimately exhaust its available memory.
00:29:40.410 Perhaps you then wonder about using threads, as they are purportedly less memory intensive. Threads represent the smallest unit of programmability coordinated by an operating system. In a multithreaded system, threads share all process resources, particularly memory. Since threads utilize the same memory space, no copy-on-write is necessary. One of the major advantages of threading is that they consume less memory, expiring when the processes they are running in cease to exist, avoiding the creation of zombie processes.
00:30:31.450 Furthermore, communication in a threaded environment proves faster as there are no sockets to manage; nonetheless, threads come with their complications. The core version of Ruby many use, MRI, does not offer true parallelism through multithreading due to something called the Global Interpreter Lock (GIL). The GIL stipulates that within a multithreaded program, only one thread can execute Ruby code at a given moment, making it somewhat counterproductive.
00:31:12.500 One may initially believe that Ruby inherently ensures safety from the race conditions surrounding parallel execution. However, this is not entirely true. This limitation could arise in situations where two threads check an instance variable's value simultaneously. Thread switching during execution may lead to concurrent access, causing unexpected behaviors.
00:31:47.490 In a multi-threaded Ruby environment, it's critical to ensure your code is thread safe, as race conditions can lead to bugs that are challenging to identify and rectify. This is especially important if you decide to use JRuby or Rubinius, as they provide functional multi-threading support free from GIL protocols.
00:32:10.340 Using a multi-threaded server, such as Puma, in scenarios where your application frequently waits on many third-party API calls can significantly enhance performance. However, you must ensure not only that your code is thread-safe but also that all gems in use respect thread safety.
00:32:52.230 In summary, our server now communicates with the outside world through Internet sockets while interacting with applications using a parser and the Rack interface. Moreover, we have completed the implementation of concurrent client handling. Let's finish by explaining how you as a server administrator can communicate with your server using signals and traps.
00:33:23.280 Signals allow interaction with processes that run on your machine, with 'Ctrl+C' being one of the most common, sending an interrupt signal prompting the program to shut down. We can implement this with a trap in our server that captures this signal and executes some code before the overall program terminates. For example, we might print a 'Terminating' notification letting the user know we are shutting down gracefully.
00:34:31.030 Lastly, a crucial signal that cannot be trapped is SIGKILL (signal 9), which results in immediate termination of any process that receives it, denying any signal handling. Today, we explored what a server is and how it communicates. We wrote a little server that, while limited, has imparted valuable UNIX tricks and will assist us as we navigate the complexities of production servers in the future. I will tweet a link to my slides, and if you'd like to connect or ask me questions, I’ll be around afterwards. Additionally, I will be at the Heroku booth tomorrow from 11 to 2 p.m. if anyone wants to chat or collect a T-shirt or stickers. Thank you, everyone!