RubyConf TH 2023

Rails Performance Monitoring 101: A Primer for Developers

A talk from RubyConfTH 2023, held in Bangkok, Thailand on October 6-7, 2023.
Find out more and register for updates for our next conference at https://rubyconfth.com/

RubyConf TH 2023

00:00:07.560 Hello everyone! Well, yesterday, I spoke to a lot of people here. I realize that not everyone here has English as their first or even second language of preference. So, I imagine with the translators around, who must be doing a great job, it’s still hard for some people to follow along. Therefore, I’m going to speak slowly because generally, I tend to go fast, and on stage, I tend to go even faster. I want everyone to be able to follow along.
00:00:22.519 That means I might have to skip a few slides here and there to fit into the time I have, but that should be okay. Thank you for showing up for my talk. I’m really excited about this, and hopefully, you’ll find this interesting and learn something new.
00:00:36.280 A little bit about me: my name is Rishi Jain, and I’m here from Bangalore, which is a city in the south of India. I enjoy programming, traveling, writing blogs, playing sports, and speaking at conferences. I’m also a Manchester United fan, so I usually spend my weekends crying.
00:01:01.519 I’d like to tell you a little bit about Bangalore. Bangalore has a few different versions; we’re infamous for traffic. But to be honest, it's not that bad if you work from home or if you plan your life around going out really late at night or early in the mornings, which is what I end up doing. The city has a great number of parks, and one of the best things about Bangalore is its weather.
00:01:24.879 If I happen to show you around Bangalore, you’ll find it is quite beautiful. One of these delicacies you must try is Dosa. Coming to what I do in Bangalore, apart from eating dosa every day, I work at a company called uLabs, where I am a senior software engineer. uLabs is a remote-first company based out of Philadelphia, and I really enjoy working there because of its diverse group of people coming from all kinds of backgrounds and countries.
00:01:54.640 Everyone is really nice there. So, what do we do at uLabs? We provide tune reports for your Ruby applications. Some of the information I’ll share today comes from my experience of doing tune reports for our clients.
00:02:04.400 We give back to the community by writing blogs, which you can read on different topics at the link mentioned above. We also maintain a few open-source gems, and you can check them out in the link as well. Now, let's get started with the reason why we are all here today.
00:02:18.599 Let’s do a quick poll. Raise your hand if you have encountered or fixed an N+1 query in the codebase or have heard from your customers that the app has been running really slow for the past few days. Have you received feedback from your senior engineers or your DevOps team that CPU utilization is very high or that the app consumes a lot of memory? If you’ve deployed something new, do you know why it’s happening? Congratulations—you all have experienced issues related to performance monitoring.
00:03:31.320 However, there are a lot of misconceptions about Rails performance, and let's address some of them. The first common misconception is that performance improvement and monitoring are only areas for senior developers in the team, and junior developers' job is just to build the features asked of them. Additionally, some think all performance-related issues should be caught during the code review. I believe it should not be this way.
00:04:45.000 As developers, we should understand the impact of the code we write from a performance perspective. Not everything can be caught during the code review by senior developers. Issues will slip through, so it is our duty as developers to be more vigilant about what we write.
00:05:07.600 During the course of this talk, we’ll see how developers can identify performance issues and help fix them while being aware of the impact of the code.
00:05:25.000 Another misconception is the backend/frontend fiasco. Sometimes, engineers writing the backend assume that their code works perfectly fine and is fast. They often attribute issues to rendering, stating that it is the front-end team’s responsibility. Both teams need to collaborate and check their logs to truly find out what is causing the delays.
00:06:25.000 Now, every time we hear that something is slow, someone in the room may say, 'Let’s cache it!' and assume that caching can alleviate the problem. This is a classic mistake; not everything slow can be made fast by simply caching data. In fact, at times, caching may further slow down performance if the data is changing too often, leading to cache invalidation and outdated data being displayed.
00:07:27.600 Bigger hardware or more hardware won’t necessarily make our apps faster. Think of it like this: larger highways won’t increase the speed of cars. It can accommodate more cars, leading to more requests, but that does not expedite any individual request. Now that we have identified some common misconceptions, let’s briefly look at why you should even care about Rails performance monitoring.
00:08:37.360 I’ll give you four reasons: scalability, user experience, cost savings, and professional growth. I will skip over the details here for the sake of time.
00:09:16.560 Now that we understand Rails performance monitoring is important and have acknowledged the misconceptions surrounding it, think of any Rails app. If you break it down into different components, you typically have a front end, a back end, a database, and a server. In this talk, we will focus only on the backend, as that's the area we have time to cover.
00:09:43.760 When discussing the backend in terms of performance monitoring and improvement, the most crucial thing we need to know about is APM tools. APM stands for application performance monitoring. It involves using software tools and telemetry data to monitor the performance of business-critical applications.
00:10:10.080 Some common APM tools include New Relic, DataDog, Scout APM, and many others. These tools provide a variety of metrics such as request queuing, response times, top transactions, and more. This information gives insights into how your app is performing resource-wise, capacity-wise, and user experience-wise.
00:10:52.000 Let’s take a closer look at some of these metrics individually. One metric that I personally find interesting is the request queue. It measures how long your request has to wait before being picked up for processing by the app server. Ideally, request queue time should be between 20 to 50 milliseconds. Anything longer than that indicates something is wrong and that customers are not having a good experience.
00:11:16.080 For example, a spike in request queue time up to 10,000 milliseconds—10 seconds—would leave users frustrated. Typically, this happens when all servers are busy processing other requests, and the incoming request has to wait until a server is available to handle it.
00:12:01.440 Another important APM metric is the list of the top slowest transactions for your web apps, which can be sorted based on different parameters. This provides quick insight into which endpoints are critical, have received most traffic, and which are slow. Thus, fixing the issues with these slow endpoints becomes easier.
00:12:59.560 APM tools also give information about throughput, which shows when your customers typically access your app. This is valuable when planning system upgrades or major deployments—it's wise to do them during low-traffic periods.
00:13:35.080 Another important metric is object allocations. It's easy in Rails to overlook the number of objects we create. Typically, this should not exceed 50,000 live objects at any time; higher counts can lead to system slowdowns and increased memory consumption. For instance, a common pattern in Rails applications involves fetching all products and looping through them to update an attribute.
00:14:25.680 While it may seem straightforward, the problem with using 'product.all' is that it loads all products into memory at once, which causes increased memory usage and can affect overall application performance. Instead, it is advisable to use the 'find_each' method to load products in batches to avoid excessive object allocation.
00:15:41.520 Another metric of interest is instance restarts, which provides information on when your instances restarted. You should check for spikes in instance restarts during peak usage times in the middle of the day. This could help in diagnosing what went wrong during that time when analyzing related metrics such as memory usage, CPU utilization, and request queue.
00:16:30.720 Now, let's discuss some common mistakes found in Rails applications, starting with the notorious N+1 query issue. Many of you raised your hands when I asked about this earlier. Consider a scenario where you're fetching all posts and then printing the title and the user’s username. If there are six posts, this could trigger seven queries: one to fetch the posts and six to fetch the individual user information—resulting in slower performance.
00:17:29.200 To fix this issue, you should eager load user information instead of making multiple queries. This allows you to execute fewer queries—only two, irrespective of the number of posts. N+1 queries are very common, and recognizing and addressing them can lead to significant performance improvements.
00:18:18.680 Now let's talk about another common mistake, which is the lack of background jobs in processing. For example, if you have a method that generates invoices and sends notifications (like SMS, email, and WhatsApp) to customers while making them wait for a response, it’s not optimal. Instead, utilize background jobs to send notifications without making the customer wait.
00:19:22.240 Next, we have the issue of timeouts for third-party integrations. If your SMS vendor is timing out after 60 seconds for requests that typically take 200 milliseconds, this is a problem. You should enforce stricter timeouts when relying on third-party services, ideally waiting no more than 2 to 5 seconds.
00:20:01.600 Handling long wait times is crucial; instead, push jobs to retry or store exceptions for later processing. Control the duration of requests to external services is essential to maintaining application performance.
00:20:54.160 Another common mistake is missing database indexes, which can significantly slow down query performance. When you run a query on a table without proper indexing, it performs a sequential scan over the whole table, leading to high latency, particularly with large datasets. When adding indexes, be mindful of the order in which fields are defined, as this can affect performance.
00:21:54.480 Understanding the differences between slow and fast indexes can also help. The performance differences become pronounced as the volume of data grows, with slow indexes becoming especially burdensome. Knowing how to formulate your queries and how to optimize indexes can be critical for maintaining performance. Adding indexes to essential query fields can reduce query execution times substantially.
00:23:52.960 As we wrap up this talk on Rails performance monitoring, I’ve shared resources such as Nate Bobic’s performance workshop and materials, which are fantastic for anyone interested in this area. Additionally, the book 'SQL Performance Explained' is a great resource for learning more about databases and performance improvement. I hope you found this information interesting and valuable. Thank you for attending!
00:25:48.640 Thank you!