Latency

The Short and Happy Lives of TCP and HTTP Requests

The Short and Happy Lives of TCP and HTTP Requests

by Starr Horne

The video "The Short and Happy Lives of TCP and HTTP Requests" by Starr Horne explores the intricacies of web development focusing on the journey of TCP and HTTP requests. Horne emphasizes the importance of understanding how the network functions for building fast and efficient web applications.

Key Points:

  • Introduction to Network Layers: Horne discusses the OSI model, explaining the different layers from application to physical, highlighting how each layer contributes to the functioning of web applications.

  • Latency vs. Bandwidth: The speaker clarifies that latency is the time information takes to travel, which is crucial for user experience, contrasting it with bandwidth, which influences the amount of data transmitted but not necessarily the speed of loading.

  • User Experience and Latency: Horne explains how delays impact user interaction, citing findings that users become less engaged if load times exceed certain thresholds (e.g., 400 milliseconds).

  • Reducing Latency: Strategies discussed include:

    • Moving servers closer to users to minimize distance-related delays.
    • Reducing round trips by eliminating unnecessary requests, which is essential as many small requests can drastically slow down loading times.
    • Utilizing keep-alive connections to maintain a single TCP connection for multiple requests, reducing connection overhead.
  • Domain Sharding and CDNs: Horne introduces domain sharding as a technique for distributing requests across multiple connections, thus improving load times. Content Delivery Networks (CDNs) are also highlighted as effective solutions for delivering content closer to end-users.

  • Prefetching Resources: The use of browser attributes to prefetch resources is suggested as an optimization technique to enhance user experience.

  • Further Learning: Horne recommends the book "High Performance Browser Networking" by Ilya Grigorik for those who wish to delve deeper into performance optimization techniques.

Conclusion:

Horne concludes by underscoring that understanding network operations not only aids in bug fixing but also significantly improves application performance. He encourages viewers to engage with him on social media for further questions and access to additional resources. This talk elucidates the significance of efficient request handling and latency reduction in developing performant web applications.

00:00:13.599 Hey everyone! If you end up liking this talk, I'm going to be tweeting out links to the slides and resources later, so be sure to check out my Twitter. I'm very excited to be here. I come down to San Francisco a couple of times a year, and every time I do, I feel like I'm flying into Starfleet Academy. SFO has this futuristic vibe, with its aluminum structures and clear skies, which is a stark contrast when I head back home to Seattle.
00:00:24.560 My talk today is about the short but happy lives of HTTP requests, with a specific focus on how to make those requests shorter. We all know how it works—when you type a URL into your browser, it triggers your Rails controller, which renders a view that ultimately sends a calming picture of a manatee back to your browser. While this is great, if you've been at this a while, you might start to notice that the connection between the browser, the controllers, and the views is a little mysterious. It's somewhat like a black box.
00:00:52.000 In order to build fast, performant web applications, it's crucial to understand what's happening inside that black box. I like to think it's full of manatees, but that’s just me! Let's start by discussing the internet. As soon as you begin researching networks, you'll encounter something known as the OSI model. It's quite simple, although there can be disputes about its representation. Some argue it looks like a hamburger, while others humorously claim it’s made of cats. After all, how else could the internet be so efficient at transmitting cat pictures?
00:01:50.000 To illustrate the OSI model, I had to dig into a 1995 issue of Dr. Dobbs Journal for an un-memed image. At the top of our model is the most abstract layer; at the bottom, the least abstract. We’re typically used to dealing with the application and presentation layers. When we drill down to the session layers, we encounter SSL, while the transport and network layers relate to TCP. Finally, the data link and physical layers correspond to your Ethernet connection and the underlying wires. Let’s talk for a moment about those wires. They are surprisingly important in modern web development because a wire determines your latency. Latency refers to the time taken for one byte of information to travel from my computer to yours.
00:04:42.000 It's essential to understand that latency is not bandwidth—it’s measured in milliseconds. The speed of light determines the lower bounds of latency. For example, the minimum theoretical latency (round-trip) between New York City and London is 37 milliseconds, but that’s reduced further because we’re not sending light through a vacuum; our information travels through fiber, which is slower. In reality, latencies between New York and London can be around 70-90 milliseconds, while latencies of about 40 milliseconds are common within the U.S. and about 16 milliseconds in Japan due to its smaller size. This discussion about latency is pivotal because it directly affects user experience.
00:07:19.350 When I click a button, if it takes more than 100 milliseconds for something to happen, I feel that my action didn’t cause the result directly. If it takes 250 milliseconds, it feels sluggish, and if it exceeds 500 milliseconds, my thoughts may wander to the stock market, as I recently started investing and am currently down $50. This has tangible real-world implications. For instance, Google found that if searches take longer than 400 milliseconds, users are less likely to search as frequently. Similarly, big online retailers have established a correlation between conversion rates and latency.
00:09:57.760 How do we reduce latency? The easiest approach is to move your servers closer to your users. However, we're here to discuss the slightly more complicated task of eliminating round trips to reduce load time. It’s essential to address bandwidth as well. Your Ethernet connection influences your bandwidth, and while cable companies insist that faster speeds significantly improve experience, studies have shown diminishing returns. After around 3-4 megabits per second, simply adding more bandwidth won’t drastically decrease load times. However, load time does decrease linearly with latency.
00:12:55.759 As web development evolved, so did the structure of web pages. For example, the home page of slate.com makes 286 requests for 1.9 megabytes of data, showcasing our current web environment where we increasingly make numerous small requests instead of one large request. It turns out that lots of small files are much slower to download than one big file over HTTP. This inefficiency is exacerbated by the way protocols interact; HTTP operates over TCP, which means that establishing a connection upfront is expensive in terms of time—every new TCP connection incurs connection overhead.
00:15:50.760 When a TCP connection is established, it undergoes a two-way handshake. The client requests to talk, and once established, you incur round trip overhead. If there’s already a latency of 100 milliseconds, you just added 100 milliseconds to your loading time. Unfortunately, it doesn’t stop there due to congestion control protocols that slow down data transmission to avoid overwhelming the network. Essentially, we can suffer many round trips just to transfer a relatively small amount of data. This is precisely why HTTP/1.0 introduced the keep-alive concept, which allows a single TCP connection for multiple requests.
00:19:02.080 However, the configuration of keep-alive can vary based on the server load, and thus the keep-alive period may need adjusting based on user behavior. It’s important to balance server load by possibly extending this period during low usage times and shortening it when traffic increases. Another approach to virtualize the requests is domain sharding—allowing multiple connections to your domains drastically reduces load times by distributing requests, thus circumventing connection limits. CDNs also provide a practical method to get content closer to users. Finally, many pages incorporate attributes that allow browsers to prefetch resources in advance, further optimizing user experience.
00:22:15.000 If you want to dive deeper into this topic, I highly recommend the book 'High Performance Browser Networking' by Ilya Grigorik, now available for free online. He goes into greater depth and clarity about the topics we've discussed today. Finally, if you have any questions, feel free to reach out to me. I'm more than happy to engage and help out! And remember to check my Twitter for the slides from today. Thank you!