RubyConf 2019

Let's build a simple HTTP server with Ruby

Let's build a simple HTTP server with Ruby

by Esther Olatunde

In this presentation titled 'Let's Build a Simple HTTP Server with Ruby', Esther Olatunde explores the fundamentals of HTTP and demonstrates how to create a simple HTTP server and client using Ruby without relying on any external libraries or gems.

The key points discussed throughout the talk include:

  • Introduction to HTTP: Esther begins with an overview of HTTP, explaining it is a request-response protocol essential for communication between web browsers and servers. She outlines the history and evolution of HTTP from its early versions to the current proposals for HTTP 3.
  • Understanding URLs: The importance of URLs is highlighted, explaining how they uniquely identify resources on the web and the structure of a typical URL.
  • Request-Response Cycle: Esther describes how a browser sends HTTP requests to a server, which then responds accordingly, either serving the requested resource or relaying error messages when necessary.
  • Socket Programming in Ruby: A significant portion of the talk is dedicated to utilizing Ruby's Socket class, which allows for bidirectional communication. Esther explains the server's functionality to listen for connections, parse incoming requests, and send responses back to clients.
  • Building a Simple HTTP Server: Esther guides the audience through building a basic HTTP server by implementing listening capabilities, request parsing, and response generation. Key features discussed include:
    • Initializing a TCP server and binding it to a port (e.g., port 5000).
    • Reading and parsing incoming HTTP requests to extract the method, path, and headers.
    • Constructing appropriate responses based on request validity, including handling of 404 errors for non-existent paths.
    • Demonstrating the server's functionality using Curl to showcase the response process.
  • Security Considerations: The presenter briefly touches on vulnerabilities in the server implementation, specifically regarding path traversal attacks, and emphasizes the importance of security measures.
  • Conclusion and Recommendations: Esther concludes by suggesting the consideration of Rack for building more robust and production-ready servers in Ruby, acknowledging that Rack manages many complexities that the simplistic server lacks. She encourages the audience to further explore the HTTP specification, the Socket library, and documentation related to Rack.

Overall, the presentation serves as a valuable resource for developers interested in understanding HTTP's mechanics and gaining hands-on experience with building a server using Ruby.

00:00:13.889 Hi everyone, I'm so excited to be here!
00:00:18.940 This is my very first time attending a conference, and I’m thrilled to be a speaker at this one.
00:00:29.380 The last couple of days have been really amazing, even more than I imagined.
00:00:36.340 So, welcome to my talk titled 'Let's Build a Simple HTTP Server with Ruby'.
00:00:43.420 The Ruby community has a few popular web servers based on Rack, such as Webrick, Puma, Thin, Unicorn, and Passenger.
00:00:50.950 These servers are battle-tested, so you usually don't need to roll your own.
00:00:58.230 However, I believe building your own server is a great learning experience if you want to understand how HTTP works.
00:01:06.340 At the end of this talk, we will have built a very simplistic, non-compliant HTTP server and client.
00:01:11.380 Before I proceed, a little bit about me: my name is Esther Olatunde.
00:01:17.319 I write code that works (sometimes) on computers, and I’m based in Lagos, Nigeria.
00:01:25.590 I work as a developer at Legsoo, which is a legal tech marketplace based in the UK.
00:01:32.590 Now, let's dig in!
00:01:39.639 How many of us here know what happens when you visit a URL in a browser?
00:01:46.029 This might seem a bit redundant for most of us, but I think we should start from here to connect the dots.
00:01:52.599 The Internet is a massive, distributed client-server information system.
00:02:00.989 Many applications are running over the web, such as browsers, email, file transfer, audio, video streaming, and e-commerce.
00:02:06.819 For proper communication to take place between clients (like your browsers) and servers, they must agree on specific application-level protocols.
00:02:13.910 Examples of these protocols are HTTP for the web, FTP for file transfer, and SMTP for email.
00:02:25.550 The protocol that makes the web work is the Hypertext Transfer Protocol, or HTTP.
00:02:34.100 It is the protocol that web browsers and web servers use to communicate with each other over the Internet.
00:02:40.940 HTTP is perhaps the most popular application protocol used over the Internet.
00:02:46.160 Let’s look briefly at how it works.
00:02:54.740 As I mentioned before, HTTP is a request-response protocol.
00:02:59.389 It describes how web servers exchange data with HTTP clients and browsers.
00:03:02.570 So when you type in a URL in a browser, here's what happens.
00:03:08.260 The web browser connects to the web server and sends an HTTP request via the TCP protocol stack for the desired webpage.
00:03:13.570 The web server receives the request and checks if the webpage is available.
00:03:21.410 If the page exists, it sends it back to the client.
00:03:27.290 If the requested page does not exist, the server sends back an error message.
00:03:36.320 This request-response cycle is essentially how HTTP and the web operate.
00:03:39.639 The HTTP specification is maintained by the World Wide Web Consortium.
00:03:46.190 The original version of HTTP, HTTP 0.9, was released in 1991 by Tim Berners-Lee.
00:03:54.830 Since then, different versions have been developed.
00:04:00.070 There is HTTP 1.0, HTTP 1.1, and HTTP 2, which was standardized in 2015.
00:04:06.699 Currently, there is a proposal for HTTP 3 that is in draft.
00:04:13.669 If you are curious, there's a link where you can read all the different specs.
00:04:19.789 Also, you can join discussions about HTTP 3.
00:04:25.520 Next, let’s talk about URLs.
00:04:35.270 A URL (Uniform Resource Locator) is used to uniquely identify a resource on the web.
00:04:43.009 The syntax of a typical URL looks like this.
00:04:48.229 It contains the protocol, the hostname, and optionally the port number.
00:04:53.270 For web traffic, the port is usually 80, and you don’t always need to specify it.
00:05:00.580 Next, the URL contains the path to the resource.
00:05:12.110 For instance, when you visit example.com/lagers, the browser issues an HTTP request.
00:05:20.330 It opens a connection to example.com on port 80.
00:05:28.580 The server accepts the connection, and upon connection,
00:05:33.740 the HTTP client (your browser) turns the URL into a request message.
00:05:41.360 The server will interpret this request and respond accordingly.
00:05:47.810 The first line of an HTTP request contains the request line, which includes the HTTP method, the request URI, and the HTTP version.
00:05:54.500 Following the request line, there are request headers that are key-value pairs.
00:06:01.009 The request may also include a request body.
00:06:06.050 When you visit that particular URL in a web browser, the HTTP request will look like this.
00:06:12.110 The server receives the request, parses it, and interprets it.
00:06:20.330 In addition, the server needs to build a response.
00:06:28.580 When your request hits the server, it performs the appropriate actions and returns the response.
00:06:35.470 For example, if you request for a page that does not exist, you may get a '404 Not Found' error.
00:06:43.680 In the case of a valid request, the server will return a content body which is rendered in the browser.
00:06:51.490 This flow provides the basic implementation of how HTTP servers operate.
00:06:59.400 The server should be able to accept a request and send back a response.
00:07:06.469 This is the most common interaction that is happening on web servers today.
00:07:12.490 Now, let's implement this behavior using Ruby, without using any external gems.
00:07:16.490 To achieve this, we need a tool that can listen for bidirectional communication between clients and servers.
00:07:23.670 This brings us to sockets and socket programming.
00:07:29.580 A socket is an endpoint for two-way communication between two programs running on a network.
00:07:36.320 A socket binds to a port so that the TCP transport layer can identify the application that the data is sent to.
00:07:44.340 The server forms the listener while the client connects to this socket.
00:07:51.920 Fortunately, the Ruby standard library has already implemented sockets for us.
00:07:58.780 This brings us to the Ruby Socket class.
00:08:06.210 The Ruby Socket class provides access to the underlying operating system socket implementation.
00:08:12.220 It contains specific classes for handling common transport protocols, as well as a generic interface.
00:08:18.120 All functionality in the socket library is accessible through a single extension.
00:08:24.400 You can refer to the documentation to explore the classes and methods available.
00:08:30.300 Now that we have our socket set up, let's proceed to build our HTTP server.
00:08:35.490 Before we dive in, let's identify three main features our server will focus on.
00:08:42.670 First, it needs to listen for connections.
00:08:48.890 Second, it needs to parse the request.
00:08:55.920 Lastly, it must be able to build and send a response back to the client.
00:09:03.460 To handle the listening for connections, we require the socket library from the standard library.
00:09:09.860 Next, we need to define our server by initializing the TCP server class and having it listen for incoming connections.
00:09:16.970 I've chosen to bind the server to port 5000, but you can choose any integer from 100 and above.
00:09:24.030 Next, we want to loop infinitely so we can process our incoming connections one at a time.
00:09:30.400 The server will wait until a client connects.
00:09:34.030 When a client connects, it returns a TCP socket that can be used like other Ruby IO objects.
00:09:40.840 Once we've connected to the client, we need to read the request.
00:09:46.950 We can use the .gets method, which reads the first line of the request.
00:09:53.070 The .gets method reads the next line from the I/O stream.
00:09:58.960 Since we've initialized our server to accept connections, the first line we read will be the HTTP request.
00:10:05.250 We can print that out to the console.
00:10:11.570 Now that we have our server listening on port 5000, it can accept connections and print the first line of requests.
00:10:18.440 However, a real server should read everything sent to the I/O stream.
00:10:24.890 We can achieve this using the .read_partial method, allowing it to read a certain number of bytes.
00:10:33.539 For instance, we can set it to read 2048 bytes.
00:10:39.700 This way, we can listen to most requests sent to our server.
00:10:45.850 When we run that, we receive the full HTTP request that has hit the server.
00:10:52.970 Now we need to parse this string, as the server needs to understand it.
00:10:59.720 We can create a function that extracts the method, path, and version from the request string.
00:11:06.970 The first line of the request string will be split into these three components.
00:11:13.780 This function will return a hash containing the request information.
00:11:20.500 We will also parse the headers from the subsequent lines.
00:11:26.870 Next, we can take the request body and split each line into key-value pairs.
00:11:34.750 This allows us to have a structured request that our server can work with.
00:11:43.520 Once we have parsed the request, we need to build a response to send back to the client.
00:11:50.430 Let's implement a method that assigns a path and checks if it's the home directory.
00:11:56.790 If the path is the home directory, we will respond with the server root path and index.html.
00:12:04.080 If not, the server should look for the requested path within the home directory.
00:12:14.470 The response method will check if the file exists at that path.
00:12:22.310 If it exists, we return an 'OK' response and read the file.
00:12:30.490 If the file does not exist, we return a '404 Not Found' response.
00:12:38.030 This is a basic overview of how we handle responses.
00:12:45.570 The 'OK' response structure is based on the HTTP specification.
00:12:51.890 We're essentially building a response string based on the request we received.
00:12:56.790 Once we have all that, we can send the response back to the client.
00:13:05.040 Now, let me show you the complete code.
00:13:10.730 We've required the socket from the standard library and extracted the HTTP request parsing and response into their own classes.
00:13:18.920 Next, we initialize our server on port 5000, listening for connections.
00:13:25.130 Once a connection is made, we read the request, create a usable request, and send it to the response class.
00:13:32.200 The response class processes the request and sends the appropriate response back to the client.
00:13:38.590 We can close the client connection afterward.
00:13:46.160 So this is the simple flow of how to implement an HTTP server.
00:13:52.500 Let me run this code to show you what the implementation looks like.
00:13:56.410 Now, when we start our server, it binds to port 5000 and waits for connections.
00:14:04.620 Let's make a request.
00:14:10.250 I will use Curl to request localhost on port 5000.
00:14:15.230 You will see the HTTP request at the top and the response beneath it.
00:14:22.150 It shows the headers from our index page.
00:14:30.810 Now, if I refresh the browser, you will see the hello world response.
00:14:37.000 I have another executable Ruby file that returns evaluated output.
00:14:43.390 For example, a Ruby file named 'nine_plus_two.rb' evaluates the result of nine plus two.
00:14:51.750 If we request a file that doesn’t exist, we will receive a '404 Not Found' error.
00:14:58.480 We can also have our server execute Ruby files if they are executable.
00:15:06.190 Now, let's modify our server to handle query strings.
00:15:13.850 As Ruby developers, we can modify the query method to support query parsing.
00:15:21.860 Using the question mark '?' will indicate that the path has a query.
00:15:29.920 Later, we can process the query and return it appropriately in our response.
00:15:37.580 I covered basic modifications, but you can explore this further.
00:15:44.520 This is our server implementation so far.
00:15:52.470 Now let’s discuss security considerations.
00:15:59.040 The current implementation is vulnerable to path traversal attacks.
00:16:06.940 This happens if someone accesses directories on the server using '../'.
00:16:12.490 It's important to ensure that sensitive directories are not accessible.
00:16:18.570 There are algorithms to prevent this, but I haven't covered them here.
00:16:26.679 As Ruby developers, we should be aware of these vulnerabilities.
00:16:33.979 So, where does Rack fit into this discussion?
00:16:41.250 Rack has built an HTTP interface for Ruby.
00:16:48.350 Most popular servers, like Webrick and Puma, are built on top of Rack.
00:16:54.790 Rack handles many features we are missing, such as query parsing and security mechanisms.
00:17:02.010 If you're considering building an accessible server, I recommend using Rack.
00:17:09.350 All the features Rack offers ensure a more robust and production-ready server.
00:17:18.110 To read more, check out the HTTP spec, the Socket library, and the Rack documentation.
00:17:26.470 You can also explore how the underlying functions work together.
00:17:34.400 That's all for my presentation. Thank you!