http://exploration

by Craig Buchek

In this video titled "HTTP Exploration" by Craig Buchek at RailsConf 2015, the key topic revolves around understanding the Hypertext Transfer Protocol (HTTP) and its practical applications for web developers, particularly in Rails. The lecture starts with a brief introduction and setup, guiding participants through exercises with provided files. Buchek elaborates on several critical aspects of HTTP, including:

HTTP Basics: Explanation of HTTP as a stateless, text-based protocol originally standardized in the 1990s and updated for HTTP/2.
Request-Response Cycle: Discussion on how browsers make requests and how servers respond, underscoring the URL structure, methods (GET, POST, PUT, DELETE), and the importance of headers like Content-Type and Authorization.
Safe and Idempotent Methods: Definitions of safe methods, and what idempotency means in a web context, ensuring repeated requests yield consistent results.
Response Codes: Overview of various HTTP status codes (200s, 400s, 500s), explaining their meanings and typical scenarios for use, such as 404 for not found and 500 for server errors.
Proxies and Caching: Insights into how proxies can stand in for clients or servers, manage requests, and enhance caching and security.
Troubleshooting Tools: Introduction to tools like Ping, Traceroute, and Wireshark for diagnosing network issues.
Future of HTTP with HTTP/2: Discussion about the advancements introduced with HTTP/2, such as multiplexing and header compression, which improve performance and change web design approaches.

Significant examples from Buchek's previous experience as a network admin help illustrate points on troubleshooting and performance enhancement. He emphasizes hands-on exercises to reinforce learning, urging attendees to ask questions and seek clarification on topics related to Vagrant and HTTP. Overall, Buchek aims to provide attendees with actionable knowledge that can be applied to improve their Rails applications and enhance their understanding of web protocols.

The session closes with gratitude for participation, encouraging attendees to engage with the exercises and implement the discussed principles.

00:00:12.360 All right folks, I'm going to get started here. Thanks for coming out. The USB keys have two files you'll need: the Vagrant file and the HTTP exploration box file. If you don't have Vagrant and VirtualBox, there are executables on there for Windows, Linux, and Mac. You'll need to have those installed. I'm going to do a half-hour lecture, hopefully a little shorter than that, and then we'll have about an hour to run through the exercises. The exercises will walk you through starting up Vagrant. I have Charlie Sanders helping me out, so raise your hand if you need a USB key to get the files.

00:00:50.700 Basically, you've got half an hour to get all that working. If you don’t, don’t worry too much; you can pair up with a neighbor. I would appreciate feedback, negative or positive. You can tweet me at Craig Buchek. My email address is up there. My presentations are on GitHub. I haven't put the latest versions of this presentation yet, but it should be there tomorrow. If you want to follow along, 'tiny.cc/HV_Exploration' will take you to this presentation.

00:01:15.900 The reason I started doing this is that I have a previous life as a network admin, so I've done a lot of troubleshooting of HTTP, going through networks and firewalls, and now I'm a Rails developer, so I've seen both sides. We’re going to talk about HTTP basics, requests, responses, proxies, some troubleshooting, and HTTP/2. Then we'll get into the exercises that touch on all of those topics. HTTP has been around since 1991. The first version didn’t actually have a version number, but retroactively we call it 0.9. It was standardized in 1996. The version that we typically use now was standardized in 2007.

00:02:06.600 That RFC is very handy if you have any questions about how HTTP works. It was just updated a few months ago and broken up into multiple pieces, basically in preparation for HTTP/2. HTTP/2 has been ratified and standardized but hasn’t been published yet. If you're interested, that’s a good URL to check out. HTTP is stateless, which means that each time you connect to the server, it doesn't remember what you did the last time you connected. The only way around that is through cookies. It is a text-based protocol; that’s kind of the whole point of this presentation. We’re going to look at the text that’s going across the wire.

00:02:56.720 It's a request-response cycle: your client, your browser, will make a request to the server, and the server will respond. I want to talk real quick about URLs. Pretty much every piece of that URL on the screen will go from the client to the server, except for the fragment. The fragment just tells the browser to go to a specific spot on the page. The scheme will typically be HTTP or HTTPS, the host is obvious, you can specify a username and password in the URL, though that is not recommended, and then the path is actually the piece that we will be working with for the most part.

00:03:28.379 The HTTP request looks like this: the method is at the beginning. We’ll talk about that in a bit. The next thing that’s present is the URL; usually, it starts with a slash and is relative to the top of the site. After that, you have the version of HTTP specified, followed by headers. The headers are metadata about what you want to get or what you want to do; there’s a host header and a content-length header. The body is not present in all requests; if you’re just getting something, there won’t be a body. But if you’re posting or putting something, it will have one. For instance, if I try to put something to Google’s front page, it probably won’t work; I tried that and got a 405 error.

00:04:49.139 The methods I talked about; normally when you go to a website, you’re going to do a GET request to ask for a page. A POST is when you want to update something or submit a form, though some forms can actually do GET, depending on whether you’re changing something or just querying, like when you go to Google’s homepage and type in a query—that’s a GET. A PUT is basically an update or upsert; you’re saying you’re giving the whole thing, replacing whatever you have. If it doesn't exist, it creates it. Rails does this a little oddly; it doesn’t handle the create part of that. DELETE does what it says: if you have permission, you can delete pages or resources through the web.

00:05:35.219 HEAD is essentially a GET request without the body. You're asking for the headers that you would get if you did a GET request for that URL. There’s also a concept of safe methods. Safe methods don't change any information. GET is not that case; the takeaway here is: don’t have your app change things when you do a GET. When Google comes scraping your site, it will use GETs, and if your site changes something on the backend, you’re going to have a bad day. You should keep safety in mind.

00:06:48.020 There’s also something called idempotency, meaning that you can call the same thing multiple times and should receive the same result each time. You can do a GET to the Google front page multiple times, and you’ll still get the same thing. The same thing goes for HEAD requests. If you DELETE a resource, you know the resource is going to be gone, if it existed. A PUT is saying to replace something; if it's there, it replaces it; if it's not, it adds it. So multiple times doesn’t matter with PUT. POST is the exception, as multiple POSTs will result in multiple copies.

00:07:32.579 Request headers can provide information about the request itself. The host header is required, stating the name of the website being accessed. This allows hosting multiple websites on a single server. The Accept header indicates the file types the browser prefers to receive. The browser might say it wants HTML or JSON but not XML. The server uses that information to choose one of the types it's capable of delivering. Interestingly, you can specify relative quality values for the Accept header, indicating your preferences between types. The Content-Length header gives the size of the body of the request being sent.

00:08:56.259 The Content-Type tells the server what you are uploading; for instance, whether it’s HTML, XML, or JSON. The Referrer is the page you were on when you clicked the link, though this term is misspelled in the standards document. Finally, the User-Agent represents the browser or web client; it could also refer to a web crawler. This string can become quite lengthy, especially in Chrome and Firefox, as browsers try to support backward compatibility. In the past, there was a concept called browser sniffing, which tried to determine the browser type through that string in order to serve different pages. However, Google does not like that since it wants a canonical version for every page it indexes.

00:10:07.740 We’ll look at those headers in the exercises. The Content-Type is primarily used for POST or PUT requests, denoting the type of body being sent to the server. Authorization is pertinent whenever you access a website with a pop-up box asking for authentication; this involves sending the username and password. While it’s not encrypted, it is hashed and sent, and we’ll find that in an exercise. The Accept-Encoding header specifies if the server can gzip the content it sends back to you for bandwidth savings.

00:10:54.380 The Connection header is the client’s way of saying, "Hey, I’ve made this request, but when I’m done, I will make another request, so don’t close the TCP connection; I want to reuse it for the next request." When loading a webpage that will have JavaScript and CSS, it allows those files to be obtained without the connection being dropped. Regarding cookies, a cookie is sent by the server, and the client is expected to return it. This maintains sessions between the server and the client. However, this header is limited to about 4K of information, so we need to keep that limit in mind when dealing with Rails cookie-based sessions.

00:12:12.279 After making a request, the server responds with an HTTP response. The first line is called the Status Line and includes the HTTP version. An interesting aspect is that you can request one version and receive a different one back, which some might find odd. The status code, such as 201, indicates success, followed by a description of the status code like 'Created.' Like the request, the response will also have headers and possibly a body; however, not every response includes a body.

00:12:59.800 Several responses do not require a body, such as 'Created.' For PUT or POST requests, the server might just return a status like 'Created' without additional information, but typically you want to do a redirect so some extra information is often included. Status codes are standardized with the 100s being informational and rarely seen. The 200s reflect success; usually for a GET request, you will get a 200 status. For POST or PUT, a 201 is likely. Redirection could lead to a 301 status code in some instances.

00:14:06.840 If you send headers related to caching, you might occasionally receive a 'Not Modified' status, indicating that you already have a copy in your cache, so the server provides only metadata without the body. Error response codes serve different purposes: 400s are client errors resulting from a mistake on the client's side. For example, a 401 indicates unauthorized access. 403 means you cannot authenticate or have authenticated but still lack permissions. A 404 status is common, indicating a page not found.

00:15:29.080 There are also 407 errors showing that proxy authentication is required; these are rarer. A 422 status is an unprocessable entity, a way of indicating that the server doesn't understand what you asked for. It is a recommended status code for API requests that do not have the correct information.

00:16:01.620 Server errors indicate that the server has encountered an issue, typically a 500 error suggests a serious crash. 502 and 504 indicate gateway errors, primarily related to reverse proxies if they take too long to respond. There are many status codes available, and one unofficial status is the 418 code, which means 'I'm a teapot.' Not sure when you’d ever need that, but someone has certainly implemented it.

00:16:40.980 Just like requests, responses have headers too. The Content-Length header is typically present because there’s a response body. The Content-Type specifies the MIME type, helping the client know the type of file it's dealing with, whether that be text/html, application/json, etc. The Content-Encoding header notifies that the body has been gzipped, while the headers remain in plain text.

00:17:16.640 Content-Disposition is used when you want a file download prompt in the browser rather than displaying it within the browser window. You can set a default filename that the browser will use. The Location header is used for redirects; it provides the URL for the redirect and must come with a status code indicating the redirection.

00:18:12.600 Set-Cookie communicates with the client, asking it to remember a token, often a random string, allowing the server to associate it with a session in the database. The WWW-Authenticate header directs the browser to prompt for username and password information if it has not been supplied.

00:19:05.880 We'll run into proxies during our exercises. A proxy effectively stands in for another; in the case of an HTTP proxy, it intercepts requests, allowing for modification, possibly performs caching, and communicates with the server to get responses, which can then be modified again before being sent back to the client. It acts between the client and the server and can adjust anything within the request.

00:20:23.960 Proxies can enhance caching capabilities and security. In these exercises, we will add some SSL to our Rails app, simplifying it so the Rails app doesn’t have to handle SSL directly. This can save CPU time on the server. Proxies can also assist with load balancing and authentication. I’ve had instances where Apache sat in front of an application server as a reverse proxy that added pop-up authentication.

00:21:19.620 There are two types of proxies: transparent and non-transparent. Transparent proxies don’t require configurations in the browser, whereas non-transparent proxies typically need some setup. Reverse and forward proxies are terms in the context of where they sit. A forward proxy sits closer to clients, while a reverse proxy sits closer to servers.

00:22:21.560 A CDN (Content Delivery Network) serves as a paid proxy solution primarily aimed at caching static content. This can help protect against Distributed Denial of Service (DDoS) attacks, especially for larger sites.

00:23:11.130 When troubleshooting network problems, consider the OSI model layers. You have physical, network, transport (TCP), and application layers, and issues can stem from any of them. If something goes wrong, troubleshooting will involve identifying which layer and narrowing down the root cause.

00:24:05.220 One of the first tools to use is Ping, which checks connectivity to an IP address. However, firewalls may prevent this. Traceroute shows each hop between you and another server, helping locate where issues may lie. A tool called Telnet can check if a server port is listening.

00:24:56.540 If you’re running the server, the 'netstat' command can help identify what services are listening. On Linux, use it with the options -plnt; on Mac, it’s -n -a with 'listen' for the process. Using TCP Dump, you can capture detailed logs of traffic crossing the wire, though it can be overwhelming. Wireshark can help visualize what's happening as it organizes TCP Dump data into a more manageable format.

00:26:12.840 HTTP/2 was recently approved and ratified, emerging from Google’s SPDY project, aimed at making the web faster. One big change in HTTP/2 is header compression, which wasn’t possible in HTTP 1.1. An additional requirement for HTTP/2 is supporting TLS (Transport Layer Security) or SSL, as it facilitates protocol negotiation.

00:26:53.400 With HTTP/2, the semantics shift slightly, as it no longer consists of purely text. While the headers can be compressed, maintaining the same semantics allows the use of various tools; however, telnet and TCP Dump won’t work as they do with HTTP/1.1. HTTP/2 also supports multiplexing, allowing multiple files to come through on a single TCP connection, which saves time compared to how browsers currently handle multiple TCP connections.

00:27:48.539 The server push feature allows a server to send resources the client may need preemptively, like CSS files alongside an HTML request. This can affect caching, but the specifics on existing caches may vary. Our approach to web design and development may change due to HTTP/2's efficiencies, allowing us to stop doing workarounds like asset pipelines and file combining. HTTP/2 introduces a different way of handling these resources.

00:29:18.900 HTTP/2 can start with a 1.1 connection and upgrade later via HTTP semantics, which can seem convoluted at times. It’s important to stay updated about browser support as this progresses; as of now, Chrome 41, Firefox 36, and certain Windows versions of Internet Explorer support it.

00:30:01.319 Nginx has plans for HTTP/2 compatibility, while tools like Curl require specific flags to be enabled, and they need manual compilation for full functionality. Similarly, Wireshark will integrate support, currently available in beta.

00:30:36.360 So it's time for the exercises; I need to update the URL quickly. Charlie will be walking around helping people get started. The exercises are on page 27 of this document. The first step is to use the command to add the Vagrant box, followed by 'vagrant up' and then 'vagrant SSH' to access the box. Then you can proceed to slide 28.

00:31:14.520 Regarding Rails support for HTTP/2, it probably won’t happen for a while, as Rails currently doesn’t fully support HTTP. Your web server technologies like Unicorn and Puma will need to support it first. You might want to set up a reverse proxy, such as Nginx, to enable HTTP/2 functionality alongside SSL, since Rails doesn’t support HTTPS directly.

00:32:14.640 If you encounter any issues while using Vagrant, like due to misconfiguration, just raise your hand, and we’ll help you out.

00:32:21.020 Remember to enter a blank line when using Telnet to avoid it waiting indefinitely. If you don’t enter a blank line, it might get stuck.

00:32:43.960 As we progress, if you have any questions, don’t hesitate to ask. I want to ensure everyone understands before we move too far ahead. Again, thank you for your participation today, and I hope you find the exercises informative.

00:33:49.680 If anyone has more transition questions about Vagrant or the exercises, please feel free to bring them up now.

00:34:22.000 I want to express my gratitude for your attention and participation.

00:57:45.660 Thank you for hanging in until the end. I appreciate your engagement and I hope you’re looking forward to the exercises.

01:01:53.640 Thank you all very much for attending this session. I look forward to seeing what you create with the knowledge you gain here.