00:00:13.889
Hi everyone, I'm so excited to be here!
00:00:18.940
This is my very first time attending a conference, and I’m thrilled to be a speaker at this one.
00:00:29.380
The last couple of days have been really amazing, even more than I imagined.
00:00:36.340
So, welcome to my talk titled 'Let's Build a Simple HTTP Server with Ruby'.
00:00:43.420
The Ruby community has a few popular web servers based on Rack, such as Webrick, Puma, Thin, Unicorn, and Passenger.
00:00:50.950
These servers are battle-tested, so you usually don't need to roll your own.
00:00:58.230
However, I believe building your own server is a great learning experience if you want to understand how HTTP works.
00:01:06.340
At the end of this talk, we will have built a very simplistic, non-compliant HTTP server and client.
00:01:11.380
Before I proceed, a little bit about me: my name is Esther Olatunde.
00:01:17.319
I write code that works (sometimes) on computers, and I’m based in Lagos, Nigeria.
00:01:25.590
I work as a developer at Legsoo, which is a legal tech marketplace based in the UK.
00:01:32.590
Now, let's dig in!
00:01:39.639
How many of us here know what happens when you visit a URL in a browser?
00:01:46.029
This might seem a bit redundant for most of us, but I think we should start from here to connect the dots.
00:01:52.599
The Internet is a massive, distributed client-server information system.
00:02:00.989
Many applications are running over the web, such as browsers, email, file transfer, audio, video streaming, and e-commerce.
00:02:06.819
For proper communication to take place between clients (like your browsers) and servers, they must agree on specific application-level protocols.
00:02:13.910
Examples of these protocols are HTTP for the web, FTP for file transfer, and SMTP for email.
00:02:25.550
The protocol that makes the web work is the Hypertext Transfer Protocol, or HTTP.
00:02:34.100
It is the protocol that web browsers and web servers use to communicate with each other over the Internet.
00:02:40.940
HTTP is perhaps the most popular application protocol used over the Internet.
00:02:46.160
Let’s look briefly at how it works.
00:02:54.740
As I mentioned before, HTTP is a request-response protocol.
00:02:59.389
It describes how web servers exchange data with HTTP clients and browsers.
00:03:02.570
So when you type in a URL in a browser, here's what happens.
00:03:08.260
The web browser connects to the web server and sends an HTTP request via the TCP protocol stack for the desired webpage.
00:03:13.570
The web server receives the request and checks if the webpage is available.
00:03:21.410
If the page exists, it sends it back to the client.
00:03:27.290
If the requested page does not exist, the server sends back an error message.
00:03:36.320
This request-response cycle is essentially how HTTP and the web operate.
00:03:39.639
The HTTP specification is maintained by the World Wide Web Consortium.
00:03:46.190
The original version of HTTP, HTTP 0.9, was released in 1991 by Tim Berners-Lee.
00:03:54.830
Since then, different versions have been developed.
00:04:00.070
There is HTTP 1.0, HTTP 1.1, and HTTP 2, which was standardized in 2015.
00:04:06.699
Currently, there is a proposal for HTTP 3 that is in draft.
00:04:13.669
If you are curious, there's a link where you can read all the different specs.
00:04:19.789
Also, you can join discussions about HTTP 3.
00:04:25.520
Next, let’s talk about URLs.
00:04:35.270
A URL (Uniform Resource Locator) is used to uniquely identify a resource on the web.
00:04:43.009
The syntax of a typical URL looks like this.
00:04:48.229
It contains the protocol, the hostname, and optionally the port number.
00:04:53.270
For web traffic, the port is usually 80, and you don’t always need to specify it.
00:05:00.580
Next, the URL contains the path to the resource.
00:05:12.110
For instance, when you visit example.com/lagers, the browser issues an HTTP request.
00:05:20.330
It opens a connection to example.com on port 80.
00:05:28.580
The server accepts the connection, and upon connection,
00:05:33.740
the HTTP client (your browser) turns the URL into a request message.
00:05:41.360
The server will interpret this request and respond accordingly.
00:05:47.810
The first line of an HTTP request contains the request line, which includes the HTTP method, the request URI, and the HTTP version.
00:05:54.500
Following the request line, there are request headers that are key-value pairs.
00:06:01.009
The request may also include a request body.
00:06:06.050
When you visit that particular URL in a web browser, the HTTP request will look like this.
00:06:12.110
The server receives the request, parses it, and interprets it.
00:06:20.330
In addition, the server needs to build a response.
00:06:28.580
When your request hits the server, it performs the appropriate actions and returns the response.
00:06:35.470
For example, if you request for a page that does not exist, you may get a '404 Not Found' error.
00:06:43.680
In the case of a valid request, the server will return a content body which is rendered in the browser.
00:06:51.490
This flow provides the basic implementation of how HTTP servers operate.
00:06:59.400
The server should be able to accept a request and send back a response.
00:07:06.469
This is the most common interaction that is happening on web servers today.
00:07:12.490
Now, let's implement this behavior using Ruby, without using any external gems.
00:07:16.490
To achieve this, we need a tool that can listen for bidirectional communication between clients and servers.
00:07:23.670
This brings us to sockets and socket programming.
00:07:29.580
A socket is an endpoint for two-way communication between two programs running on a network.
00:07:36.320
A socket binds to a port so that the TCP transport layer can identify the application that the data is sent to.
00:07:44.340
The server forms the listener while the client connects to this socket.
00:07:51.920
Fortunately, the Ruby standard library has already implemented sockets for us.
00:07:58.780
This brings us to the Ruby Socket class.
00:08:06.210
The Ruby Socket class provides access to the underlying operating system socket implementation.
00:08:12.220
It contains specific classes for handling common transport protocols, as well as a generic interface.
00:08:18.120
All functionality in the socket library is accessible through a single extension.
00:08:24.400
You can refer to the documentation to explore the classes and methods available.
00:08:30.300
Now that we have our socket set up, let's proceed to build our HTTP server.
00:08:35.490
Before we dive in, let's identify three main features our server will focus on.
00:08:42.670
First, it needs to listen for connections.
00:08:48.890
Second, it needs to parse the request.
00:08:55.920
Lastly, it must be able to build and send a response back to the client.
00:09:03.460
To handle the listening for connections, we require the socket library from the standard library.
00:09:09.860
Next, we need to define our server by initializing the TCP server class and having it listen for incoming connections.
00:09:16.970
I've chosen to bind the server to port 5000, but you can choose any integer from 100 and above.
00:09:24.030
Next, we want to loop infinitely so we can process our incoming connections one at a time.
00:09:30.400
The server will wait until a client connects.
00:09:34.030
When a client connects, it returns a TCP socket that can be used like other Ruby IO objects.
00:09:40.840
Once we've connected to the client, we need to read the request.
00:09:46.950
We can use the .gets method, which reads the first line of the request.
00:09:53.070
The .gets method reads the next line from the I/O stream.
00:09:58.960
Since we've initialized our server to accept connections, the first line we read will be the HTTP request.
00:10:05.250
We can print that out to the console.
00:10:11.570
Now that we have our server listening on port 5000, it can accept connections and print the first line of requests.
00:10:18.440
However, a real server should read everything sent to the I/O stream.
00:10:24.890
We can achieve this using the .read_partial method, allowing it to read a certain number of bytes.
00:10:33.539
For instance, we can set it to read 2048 bytes.
00:10:39.700
This way, we can listen to most requests sent to our server.
00:10:45.850
When we run that, we receive the full HTTP request that has hit the server.
00:10:52.970
Now we need to parse this string, as the server needs to understand it.
00:10:59.720
We can create a function that extracts the method, path, and version from the request string.
00:11:06.970
The first line of the request string will be split into these three components.
00:11:13.780
This function will return a hash containing the request information.
00:11:20.500
We will also parse the headers from the subsequent lines.
00:11:26.870
Next, we can take the request body and split each line into key-value pairs.
00:11:34.750
This allows us to have a structured request that our server can work with.
00:11:43.520
Once we have parsed the request, we need to build a response to send back to the client.
00:11:50.430
Let's implement a method that assigns a path and checks if it's the home directory.
00:11:56.790
If the path is the home directory, we will respond with the server root path and index.html.
00:12:04.080
If not, the server should look for the requested path within the home directory.
00:12:14.470
The response method will check if the file exists at that path.
00:12:22.310
If it exists, we return an 'OK' response and read the file.
00:12:30.490
If the file does not exist, we return a '404 Not Found' response.
00:12:38.030
This is a basic overview of how we handle responses.
00:12:45.570
The 'OK' response structure is based on the HTTP specification.
00:12:51.890
We're essentially building a response string based on the request we received.
00:12:56.790
Once we have all that, we can send the response back to the client.
00:13:05.040
Now, let me show you the complete code.
00:13:10.730
We've required the socket from the standard library and extracted the HTTP request parsing and response into their own classes.
00:13:18.920
Next, we initialize our server on port 5000, listening for connections.
00:13:25.130
Once a connection is made, we read the request, create a usable request, and send it to the response class.
00:13:32.200
The response class processes the request and sends the appropriate response back to the client.
00:13:38.590
We can close the client connection afterward.
00:13:46.160
So this is the simple flow of how to implement an HTTP server.
00:13:52.500
Let me run this code to show you what the implementation looks like.
00:13:56.410
Now, when we start our server, it binds to port 5000 and waits for connections.
00:14:04.620
Let's make a request.
00:14:10.250
I will use Curl to request localhost on port 5000.
00:14:15.230
You will see the HTTP request at the top and the response beneath it.
00:14:22.150
It shows the headers from our index page.
00:14:30.810
Now, if I refresh the browser, you will see the hello world response.
00:14:37.000
I have another executable Ruby file that returns evaluated output.
00:14:43.390
For example, a Ruby file named 'nine_plus_two.rb' evaluates the result of nine plus two.
00:14:51.750
If we request a file that doesn’t exist, we will receive a '404 Not Found' error.
00:14:58.480
We can also have our server execute Ruby files if they are executable.
00:15:06.190
Now, let's modify our server to handle query strings.
00:15:13.850
As Ruby developers, we can modify the query method to support query parsing.
00:15:21.860
Using the question mark '?' will indicate that the path has a query.
00:15:29.920
Later, we can process the query and return it appropriately in our response.
00:15:37.580
I covered basic modifications, but you can explore this further.
00:15:44.520
This is our server implementation so far.
00:15:52.470
Now let’s discuss security considerations.
00:15:59.040
The current implementation is vulnerable to path traversal attacks.
00:16:06.940
This happens if someone accesses directories on the server using '../'.
00:16:12.490
It's important to ensure that sensitive directories are not accessible.
00:16:18.570
There are algorithms to prevent this, but I haven't covered them here.
00:16:26.679
As Ruby developers, we should be aware of these vulnerabilities.
00:16:33.979
So, where does Rack fit into this discussion?
00:16:41.250
Rack has built an HTTP interface for Ruby.
00:16:48.350
Most popular servers, like Webrick and Puma, are built on top of Rack.
00:16:54.790
Rack handles many features we are missing, such as query parsing and security mechanisms.
00:17:02.010
If you're considering building an accessible server, I recommend using Rack.
00:17:09.350
All the features Rack offers ensure a more robust and production-ready server.
00:17:18.110
To read more, check out the HTTP spec, the Socket library, and the Rack documentation.
00:17:26.470
You can also explore how the underlying functions work together.
00:17:34.400
That's all for my presentation. Thank you!