00:00:09.769
Thank you for coming to my talk. That's very kind and generous of you to listen to me discuss some important topics.
00:00:15.870
My talk is titled "Don't Forget the Network: Your App is Slower Than You Think." I'm going to talk about some things you probably haven't considered about how people use your application.
00:00:28.580
I'm going to explore how users may be experiencing your application in ways that are worse than you might think. I apologize in advance if my talk makes you feel bad for your users, so brace yourselves.
00:00:41.340
Before I dive in, let me introduce myself. My name is André Arko, and I'm involved in nearly all things Ruby. While this slide shows an older avatar of me, I promise to get that fixed before posting the slides on Speaker Deck.
00:00:58.079
I authored the third edition of a book called "The Ruby Way," which I'm quite proud of. I learned Ruby from the very first edition of this book, which was my favorite, even though I couldn't recommend it at the time as it covered Ruby 1.8.
00:01:18.360
Now, the third edition covers Ruby versions 2.2 and 2.3. I recommend buying it because in a couple of years, you can use it to prop up your monitor just as I do with my copy of the second edition.
00:01:36.119
I work at Cloud City Development, where we specialize in mobile and web application development from scratch. However, I also join teams that need a senior-level developer to assist with their Rails or frontend applications.
00:01:49.770
I've been involved in many projects, and if this talk makes you feel like you could use some assistance, please feel free to talk to me later. One of the other things I work on is something called Bundler.
00:02:09.539
I've been a part of the Bundler project for a long time, and it has been a great experience to work on open source and engage with every aspect of the Ruby community. People use Bundler in ways I would never have imagined, and I get to help solve their problems.
00:02:37.819
We've put in considerable effort to make it relatively easy to start contributing to open source through Bundler compared to many other projects. If you're interested in contributing to open source, please talk to me later or tweet at me, and I would love to help you get started.
00:02:56.630
Lastly, I spend some time on Ruby Together, which is a non-profit trade association for Ruby developers and companies. Ruby Together pays developers to work on Bundler and RubyGems, ensuring that when you run 'bundle install,' it actually works.
00:03:17.000
Without support from companies and individuals, services like RubyGems.org would struggle to stay operational. We need funding to maintain the servers and keep everything running smoothly.
00:03:36.000
Thanks to the generosity of companies like Stripe, Basecamp, New Relic, and Airbnb, we can afford to keep everything functional. We've managed to keep RubyGems.org online without interruption for the past year, but as usage continues to grow, we need more companies to contribute.
00:04:09.829
Now, let's discuss the network connection and how it makes your app slower than you think. Routing is an essential aspect of your application, even if you haven't considered it before.
00:04:16.389
At one point, there was a widely shared article on Rap Genius's blog discussing how the Heroku router was ineffective, and while it may be unfortunate, whether you're on Heroku or not, routing is an integral part of your application and it can negatively affect performance.
00:04:43.719
Let's talk about how routing functions in your application's infrastructure. It's responsible for taking requests from the outside world and forwarding them through your infrastructure until they reach your Rails app server.
00:05:01.900
Once it's processed, the server responds, but then that response has to travel back through a variety of proxy servers before it reaches the user.
00:05:20.919
So, how does this all work? You may not have thought about this before, especially in development, where routing seems like a non-issue.
00:05:38.979
But in production, multiple app servers handle requests from various locations, creating a more complex environment. Every additional layer in this process adds time to what users see, which you might never notice while working on your laptop.
00:06:27.699
Let's have a quick raise of hands: how many of you know how long your routing layer takes? Based on my experience asking this question at various talks, I usually see only a few hands raised.
00:06:55.390
It's an important question to consider, as end users' experience is directly influenced by the total time spent on your routing layer. When a person uses your web app, they experience their requests going through your routing layer twice.
00:07:21.220
However, none of that time is included in metrics you might observe from tools like New Relic, which complicates understanding actual user waiting time.
00:07:49.030
You may look at your New Relic graphs and feel satisfied seeing quick response times like 250 milliseconds without recognizing how much additional time might be added to that before users experience a response.
00:08:08.310
It is vital to comprehend potential traffic surges that could overwhelm your routing layer and the challenge of queueing requests effectively.
00:08:34.630
In many Rails applications, where some requests can be quick while others take significantly longer, you risk the chance of high latency during busy periods.
00:09:01.090
You may discover a confusing situation where fast queries hit a timeout while they should be running smoothly, leaving you to wonder why.
00:09:30.360
New Relic does provide some insight into this through features like Queue Tracking, where you can set a header to show when the request began compared to when the server ultimately processed it.
00:10:05.960
You should observe how much time requests spend waiting for server availability. In many cases, you may find that the total user waiting time is not even being measured.
00:10:34.420
Ultimately, it's crucial to measure holistic request times. You want to truly understand the overall experience of users on the internet.
00:10:55.620
One effective strategy I recommend is to create a Rails controller that returns an empty string and use it in combination with monitoring services to explore how the infrastructure affects response delays.
00:11:25.800
You might find that certain geographical locations impact performance significantly, prompting the consideration of a CDN for slower regions.
00:12:06.160
It is often difficult to ascertain how your application's performance varies across regions and even whether or not it meets the needs of your business.
00:12:50.000
So, keeping track of these metrics can facilitate informed decisions and help enhance user experiences.
00:13:11.520
Now talking about servers, I assume that if you have an application deployed, you also have servers in place. You've either purchased, racked them, or rented virtual machines.
00:13:51.240
Regardless of your setup, it's essential to understand what might be happening on those servers, as they run a multitude of processes that you might not be aware of.
00:14:19.090
It's crucial to know how the environment in which your application runs impacts user experience. A common concern across various programming languages is how garbage collection pauses can add delays.
00:14:50.800
For instance, if your Ruby application pauses due to garbage collection, you need to understand its duration and its overall impact.
00:15:22.840
Tools like GC profiling can provide insights, but it's vital to understand there are other reasons for code pauses that can complicate matter.
00:15:41.670
Drawing from experiences shared by developers at places like Paper Trail, I found a clever approach. By starting another thread to track elapsed time during these pauses, you can understand the overhead effect on performance.
00:16:03.530
Monitoring how long your application spends not executing can offer a clearer picture of resource usage and identifying potential issues.
00:16:23.880
Different virtual machine setups can also introduce additional complexities, especially when running processes within layers of VMs, which can hinder responsiveness.
00:16:49.590
If you're sharing resources with co-tenants who are resource-intensive, you may not be aware of the problems they create for your application.
00:17:20.660
You may also face IO bandwidth constraints from competing applications. This highlights the importance of accurately tracking performance under different server conditions.
00:17:50.850
By taking a proactive approach to monitoring and recognizing the potential impact of resource contention, you position yourself to address performance issues effectively.
00:18:18.320
Just as how Netflix ensures competent performance from their EC2 instances through efficient benchmarking, you can leverage the understanding of your server performance to optimize your costs and service delivery.
00:18:44.190
Understanding the specific needs of your application can help you determine whether CPU, memory, disk, or network bottlenecks are impacting performance.
00:19:10.490
As your application scales, being aware of your metrics can significantly influence your resource allocation strategies.
00:19:49.660
It's obvious that you should be measuring elements you haven't previously captured, which brings us to the point of metrics themselves.
00:20:18.660
Fortunately, the Ruby community has a solid reputation for metrics collection and use, and services like New Relic simplify this process.
00:20:55.090
Tracking metrics is crucial; without them, your production environment operates like a black box, making it impossible to determine the quality of user experiences.
00:21:18.810
The importance of metrics really stood out for me during a talk by Coda Hale at GitHub in 2009 where he emphasized that the core of our work is to deliver business value.
00:21:49.290
To do so, we need to effectively measure and evaluate performance, otherwise, we risk failing to meet user or business expectations.
00:22:23.660
While metrics hold significance, they could lead to a false sense of understanding if not interpreted properly. Receiving an average can sometimes cloud the actual reality, leading to misconceptions.
00:22:57.660
Averages can mislead due to human tendencies to assume they perfectly represent a distribution, which isn't always the case. This matters significantly in applications.
00:23:20.190
Real-world metrics often deviate sharply from the expected normal distribution. In operational metrics across multiple servers, averages may obscure critical performance insights.
00:23:43.490
Enhanced awareness comes not from just relying on averages but observing and understanding behavior through percentile metrics to draw conclusions about outlier behaviors.
00:24:06.000
Visualizing your metrics offers a clearer perspective, as various datasets might look visually very different despite sharing average values.
00:24:34.790
I urge that you don't solely depend on averages for alerts. Alerts should notify deviations from your known functioning baseline, not just when averages decline.
00:25:01.920
In summary, the network plays a critical role in the overall performance of your application. Developers often overlook its significance, focusing instead on the code without considering network factors.
00:25:39.680
After deploying your application, remember that the user experience is essential. Regardless of processing efficiency, users ultimately care about the total time it takes for their requests to be fulfilled.
00:26:08.680
In any case, if you're not alerting on averages, determining the foundation of your operations is key to avoiding alert fatigue.
00:26:49.540
The best advice I've received involves identifying your system's average baseline and adjusting alerts to signal deviations from that baseline.
00:27:22.230
Understanding your application and business's distinct metrics allows for more nuanced alerts, thus avoiding unnecessary noise.
00:27:58.490
As we conclude, if you have any further questions, I'm happy to discuss these ideas further. Thank you for your time!