Let's build a simple HTTP server with Ruby

00:00:13.889 Hi everyone, I'm so excited to be here!
00:00:18.940 This is my very first time attending a conference, and I’m thrilled to be a speaker at this one.
00:00:29.380 The last couple of days have been really amazing, even more than I imagined.
00:00:36.340 So, welcome to my talk titled 'Let's Build a Simple HTTP Server with Ruby'.
00:00:43.420 The Ruby community has a few popular web servers based on Rack, such as Webrick, Puma, Thin, Unicorn, and Passenger.
00:00:50.950 These servers are battle-tested, so you usually don't need to roll your own.
00:00:58.230 However, I believe building your own server is a great learning experience if you want to understand how HTTP works.
00:01:06.340 At the end of this talk, we will have built a very simplistic, non-compliant HTTP server and client.
00:01:11.380 Before I proceed, a little bit about me: my name is Esther Olatunde.
00:01:17.319 I write code that works (sometimes) on computers, and I’m based in Lagos, Nigeria.
00:01:25.590 I work as a developer at Legsoo, which is a legal tech marketplace based in the UK.
00:01:32.590 Now, let's dig in!
00:01:39.639 How many of us here know what happens when you visit a URL in a browser?
00:01:46.029 This might seem a bit redundant for most of us, but I think we should start from here to connect the dots.
00:01:52.599 The Internet is a massive, distributed client-server information system.
00:02:00.989 Many applications are running over the web, such as browsers, email, file transfer, audio, video streaming, and e-commerce.
00:02:06.819 For proper communication to take place between clients (like your browsers) and servers, they must agree on specific application-level protocols.
00:02:13.910 Examples of these protocols are HTTP for the web, FTP for file transfer, and SMTP for email.
00:02:25.550 The protocol that makes the web work is the Hypertext Transfer Protocol, or HTTP.
00:02:34.100 It is the protocol that web browsers and web servers use to communicate with each other over the Internet.
00:02:40.940 HTTP is perhaps the most popular application protocol used over the Internet.
00:02:46.160 Let’s look briefly at how it works.
00:02:54.740 As I mentioned before, HTTP is a request-response protocol.
00:02:59.389 It describes how web servers exchange data with HTTP clients and browsers.
00:03:02.570 So when you type in a URL in a browser, here's what happens.
00:03:08.260 The web browser connects to the web server and sends an HTTP request via the TCP protocol stack for the desired webpage.
00:03:13.570 The web server receives the request and checks if the webpage is available.
00:03:21.410 If the page exists, it sends it back to the client.
00:03:27.290 If the requested page does not exist, the server sends back an error message.
00:03:36.320 This request-response cycle is essentially how HTTP and the web operate.
00:03:39.639 The HTTP specification is maintained by the World Wide Web Consortium.
00:03:46.190 The original version of HTTP, HTTP 0.9, was released in 1991 by Tim Berners-Lee.
00:03:54.830 Since then, different versions have been developed.
00:04:00.070 There is HTTP 1.0, HTTP 1.1, and HTTP 2, which was standardized in 2015.
00:04:06.699 Currently, there is a proposal for HTTP 3 that is in draft.
00:04:13.669 If you are curious, there's a link where you can read all the different specs.
00:04:19.789 Also, you can join discussions about HTTP 3.
00:04:25.520 Next, let’s talk about URLs.
00:04:35.270 A URL (Uniform Resource Locator) is used to uniquely identify a resource on the web.
00:04:43.009 The syntax of a typical URL looks like this.
00:04:48.229 It contains the protocol, the hostname, and optionally the port number.
00:04:53.270 For web traffic, the port is usually 80, and you don’t always need to specify it.
00:05:00.580 Next, the URL contains the path to the resource.
00:05:12.110 For instance, when you visit, the browser issues an HTTP request.
00:05:20.330 It opens a connection to on port 80.
00:05:28.580 The server accepts the connection, and upon connection,
00:05:33.740 the HTTP client (your browser) turns the URL into a request message.
00:05:41.360 The server will interpret this request and respond accordingly.
00:05:47.810 The first line of an HTTP request contains the request line, which includes the HTTP method, the request URI, and the HTTP version.
00:05:54.500 Following the request line, there are request headers that are key-value pairs.
00:06:01.009 The request may also include a request body.
00:06:06.050 When you visit that particular URL in a web browser, the HTTP request will look like this.
00:06:12.110 The server receives the request, parses it, and interprets it.
00:06:20.330 In addition, the server needs to build a response.
00:06:28.580 When your request hits the server, it performs the appropriate actions and returns the response.
00:06:35.470 For example, if you request for a page that does not exist, you may get a '404 Not Found' error.
00:06:43.680 In the case of a valid request, the server will return a content body which is rendered in the browser.
00:06:51.490 This flow provides the basic implementation of how HTTP servers operate.
00:06:59.400 The server should be able to accept a request and send back a response.
00:07:06.469 This is the most common interaction that is happening on web servers today.
00:07:12.490 Now, let's implement this behavior using Ruby, without using any external gems.
00:07:16.490 To achieve this, we need a tool that can listen for bidirectional communication between clients and servers.
00:07:23.670 This brings us to sockets and socket programming.
00:07:29.580 A socket is an endpoint for two-way communication between two programs running on a network.
00:07:36.320 A socket binds to a port so that the TCP transport layer can identify the application that the data is sent to.
00:07:44.340 The server forms the listener while the client connects to this socket.
00:07:51.920 Fortunately, the Ruby standard library has already implemented sockets for us.
00:07:58.780 This brings us to the Ruby Socket class.
00:08:06.210 The Ruby Socket class provides access to the underlying operating system socket implementation.
00:08:12.220 It contains specific classes for handling common transport protocols, as well as a generic interface.
00:08:18.120 All functionality in the socket library is accessible through a single extension.
00:08:24.400 You can refer to the documentation to explore the classes and methods available.
00:08:30.300 Now that we have our socket set up, let's proceed to build our HTTP server.
00:08:35.490 Before we dive in, let's identify three main features our server will focus on.
00:08:42.670 First, it needs to listen for connections.
00:08:48.890 Second, it needs to parse the request.
00:08:55.920 Lastly, it must be able to build and send a response back to the client.
00:09:03.460 To handle the listening for connections, we require the socket library from the standard library.
00:09:09.860 Next, we need to define our server by initializing the TCP server class and having it listen for incoming connections.
00:09:16.970 I've chosen to bind the server to port 5000, but you can choose any integer from 100 and above.
00:09:24.030 Next, we want to loop infinitely so we can process our incoming connections one at a time.
00:09:30.400 The server will wait until a client connects.
00:09:34.030 When a client connects, it returns a TCP socket that can be used like other Ruby IO objects.
00:09:40.840 Once we've connected to the client, we need to read the request.
00:09:46.950 We can use the .gets method, which reads the first line of the request.
00:09:53.070 The .gets method reads the next line from the I/O stream.
00:09:58.960 Since we've initialized our server to accept connections, the first line we read will be the HTTP request.
00:10:05.250 We can print that out to the console.
00:10:11.570 Now that we have our server listening on port 5000, it can accept connections and print the first line of requests.
00:10:18.440 However, a real server should read everything sent to the I/O stream.
00:10:24.890 We can achieve this using the .read_partial method, allowing it to read a certain number of bytes.
00:10:33.539 For instance, we can set it to read 2048 bytes.
00:10:39.700 This way, we can listen to most requests sent to our server.
00:10:45.850 When we run that, we receive the full HTTP request that has hit the server.
00:10:52.970 Now we need to parse this string, as the server needs to understand it.
00:10:59.720 We can create a function that extracts the method, path, and version from the request string.
00:11:06.970 The first line of the request string will be split into these three components.
00:11:13.780 This function will return a hash containing the request information.
00:11:20.500 We will also parse the headers from the subsequent lines.
00:11:26.870 Next, we can take the request body and split each line into key-value pairs.
00:11:34.750 This allows us to have a structured request that our server can work with.
00:11:43.520 Once we have parsed the request, we need to build a response to send back to the client.
00:11:50.430 Let's implement a method that assigns a path and checks if it's the home directory.
00:11:56.790 If the path is the home directory, we will respond with the server root path and index.html.
00:12:04.080 If not, the server should look for the requested path within the home directory.
00:12:14.470 The response method will check if the file exists at that path.
00:12:22.310 If it exists, we return an 'OK' response and read the file.
00:12:30.490 If the file does not exist, we return a '404 Not Found' response.
00:12:38.030 This is a basic overview of how we handle responses.
00:12:45.570 The 'OK' response structure is based on the HTTP specification.
00:12:51.890 We're essentially building a response string based on the request we received.
00:12:56.790 Once we have all that, we can send the response back to the client.
00:13:05.040 Now, let me show you the complete code.
00:13:10.730 We've required the socket from the standard library and extracted the HTTP request parsing and response into their own classes.
00:13:18.920 Next, we initialize our server on port 5000, listening for connections.
00:13:25.130 Once a connection is made, we read the request, create a usable request, and send it to the response class.
00:13:32.200 The response class processes the request and sends the appropriate response back to the client.
00:13:38.590 We can close the client connection afterward.
00:13:46.160 So this is the simple flow of how to implement an HTTP server.
00:13:52.500 Let me run this code to show you what the implementation looks like.
00:13:56.410 Now, when we start our server, it binds to port 5000 and waits for connections.
00:14:04.620 Let's make a request.
00:14:10.250 I will use Curl to request localhost on port 5000.
00:14:15.230 You will see the HTTP request at the top and the response beneath it.
00:14:22.150 It shows the headers from our index page.
00:14:30.810 Now, if I refresh the browser, you will see the hello world response.
00:14:37.000 I have another executable Ruby file that returns evaluated output.
00:14:43.390 For example, a Ruby file named 'nine_plus_two.rb' evaluates the result of nine plus two.
00:14:51.750 If we request a file that doesn’t exist, we will receive a '404 Not Found' error.
00:14:58.480 We can also have our server execute Ruby files if they are executable.
00:15:06.190 Now, let's modify our server to handle query strings.
00:15:13.850 As Ruby developers, we can modify the query method to support query parsing.
00:15:21.860 Using the question mark '?' will indicate that the path has a query.
00:15:29.920 Later, we can process the query and return it appropriately in our response.
00:15:37.580 I covered basic modifications, but you can explore this further.
00:15:44.520 This is our server implementation so far.
00:15:52.470 Now let’s discuss security considerations.
00:15:59.040 The current implementation is vulnerable to path traversal attacks.
00:16:06.940 This happens if someone accesses directories on the server using '../'.
00:16:12.490 It's important to ensure that sensitive directories are not accessible.
00:16:18.570 There are algorithms to prevent this, but I haven't covered them here.
00:16:26.679 As Ruby developers, we should be aware of these vulnerabilities.
00:16:33.979 So, where does Rack fit into this discussion?
00:16:41.250 Rack has built an HTTP interface for Ruby.
00:16:48.350 Most popular servers, like Webrick and Puma, are built on top of Rack.
00:16:54.790 Rack handles many features we are missing, such as query parsing and security mechanisms.
00:17:02.010 If you're considering building an accessible server, I recommend using Rack.
00:17:09.350 All the features Rack offers ensure a more robust and production-ready server.
00:17:18.110 To read more, check out the HTTP spec, the Socket library, and the Rack documentation.
00:17:26.470 You can also explore how the underlying functions work together.
00:17:34.400 That's all for my presentation. Thank you!