Talks

Managing Ruby on Rails for High Performance

Managing Ruby on Rails for High Performance

by Bill Lapcevic

Managing Ruby on Rails for High Performance

In his presentation at LA RubyConf 2009, Bill Lapcevic discusses the crucial aspects of managing Ruby on Rails applications to achieve high performance using New Relic's RPM (Rails Performance Management) tool. He opens with a humorous anecdote from a Raiders game, emphasizing the importance of a strong community around specific interests, which parallels the Ruby on Rails community's focus on optimal performance.

Key points from the presentation include:

- Importance of Performance Monitoring: Lapcevic stresses the need for web applications to perform well and scale efficiently. He highlights that monitoring is vital as performance issues directly impact customer satisfaction. The use of New Relic RPM is integral to ensuring high server performance.
- Usage of RPM Tool: The RPM tool is described as a production performance monitoring tool designed specifically for Rails applications. It captures real-time data from applications and helps in troubleshooting issues that arise after deployments.
- Deployment Practices: The speaker contrasts typical Java deployment practices with Rails, noting that while Java may deploy every six to nine months, many Rails developers actively deploy multiple times a day, necessitating robust monitoring capabilities to swiftly address any arising bugs and performance bottlenecks.
- Critical Metrics and Features: Lapcevic discusses several key features of RPM, including Apdex scores (a standard for measuring user satisfaction), error tracking, transaction monitoring, and deployment performance analytics. He emphasizes that knowing whether a performance issue stems from the application or the database is fundamental in addressing problems quickly.
- Case Study and Examples: He shares insights from their dashboard data, detailing how they monitor errors and the steps they take when performance degrades post-deployment. A specific case involving unexpected behavior from a Mac widget installation serves to illustrate how monitoring enables rapid identification and resolution of performance issues.
- Business Model and Tools: Lapcevic concludes with an overview of New Relic’s pricing models and the promotional offer for attendees. He refers to partnerships and resources available to users, fostering a supportive environment for Rails developers.

The main takeaway from the video is the critical role of performance monitoring in maintaining the efficiency and reliability of Rails applications. Lapcevic highlights how tools like New Relic RPM provide insights that help developers make informed decisions, optimize application performance, and address issues swiftly as they arise.

00:00:13.120 I wanted to quickly start by telling you a story. A couple of years ago, there was a time when anyone could hear about the Oakland Raiders. I apologize if you're a fan. I'm not a Raiders fan either; in fact, I'm the most pathetic of football fans, as I am a New York Jets fan. This means I have about another 14 years before the contract that Joe made with the devil to win the Super Bowl expires and we have a winning season. A few years ago, a friend of mine came to me with Raiders playoff tickets, which was quite a rarity as there was a time when the Raiders were actually in the playoffs. He said, 'Why don't you come with me to the Coliseum and experience the game?' You may be familiar with the 'Black Hole' - those notorious pictures of the stadium where everyone is wearing face paint and armed with machetes and other accessories. That's the Black Hole. So, here I am, a Jets fan in the Black Hole for a playoff game.
00:02:00.040 What my friend did was give me a Fred Biletnikoff jersey. If you're not familiar with who Biletnikoff was, he is a famous wide receiver for the Raiders. When I arrived there, everyone thought I was the greatest fan. They were high-fiving me when the Raiders did something great. Internally, I was cheering, but externally, I wore my sad face, and everyone was consoled by my apparent misery. It turned out they were all really great guys as long as you were wearing a Raiders jersey.
00:02:47.400 The reason I'm sharing this story is that Lou, who was supposed to be here today, unfortunately, is deathly ill, as evidenced by the email that he sent me. Lou is the founder of New Relic and is quite technically savvy. He sent me an email thanking me for stepping in for him, saying, 'Thanks for stepping in for me, Bill! I hope you enjoy rock and roll lullabies at the last minute.' I have no idea what that meant. Whereas Lou is the founder of New Relic, I am not particularly technically skilled or smart. My name is Bill, and I am the Vice President of Business Development at New Relic.
00:03:28.680 With that apology in advance, I'm going to take you through some of what we can do with RPM. I am going to talk to you a bit about how we use New Relic at New Relic to ensure that our giant Rails app performs at a high level. We receive a lot of data sent to it all the time. Our goal is to keep it coming. We utilize this tool extensively to manage our production servers. By a show of hands, how many of you are familiar with RPM? A few people. For those who don’t know, New Relic is based on the premise that having a web application that actually performs and that a Rails application scales efficiently is crucial. And that achieving this requires the right tools.
00:04:05.360 To that end, we created RPM, which stands for Rails Performance Management. It is a production performance monitoring tool for Rails applications, and we use this extensively. We have a staging environment where we deploy all our new code and monitor it as it crashes, fixing issues that arise. However, the more critical part is that once things pass the staging environment, we deploy them into production. In the Java world, deployments would happen every six to nine months, and businesses would wait for a designated one-week window per year to deploy new code. Let me take a quick show of hands: how many people deploy to production at least once a month?
00:04:45.360 Okay. How many deploy to production once a week? Great! Now, how about once a day? Really? How about once an hour? We actually have customers deploying five to six times a day. It turns out that monitoring a production application is crucial, as is getting information back to developers so they can quickly address any bugs that arise after deployment.
00:05:20.680 Since performance issues affect your customers the most, understanding where the problems are coming from is vital. Bonus points to anyone who can tell me where that quote came from. It's a song; I won't give you any hints. So, monitoring production applications is important. It turns out many Rails developers use monitoring tools to oversee their production applications. In fact, there was a recent Rails survey that showed about 50% of all respondents claim to deploy applications in production using monitoring tools.
00:06:00.320 This is impressive, as it’s not always obvious that your rigorous testing will catch every problem. Many of our customers use RPM particularly for one key purpose: when a problem arises, how do you determine who needs to fix it? It raises pressing questions: Is it a database problem? Is it affecting all your customers, some of them, or none? Is this an important issue? How is the performance of your application matching your expectations? We focus on analyzing the app tier of your application, which consists of the Rails stack, your code, Passenger, and so on. This understanding is beneficial, especially for those deploying applications in the cloud.
00:06:46.640 It’s crucial to know how well the routing is working and how efficiently the servers are running. The way we analyze things is by examining how the app performs using all the other resources around it, such as the database. We don’t focus on monitoring the database itself, as there are great monitoring tools for MySQL and other databases out there. Instead, we assess how effectively our app interacts with the database: What queries is it making? How are those queries being affected? Especially when I roll out newer features.
00:07:29.560 The bottom line is that there are better tools available for this purpose. What we offer is one of them, and these tools provide a more efficient way than combing through log files afterward, something that everyone loves to do, right? The oldest task in the book! So before we get into error detection, let me show you RPM. What you are currently seeing is New Relic RPM monitoring our production service.
00:08:11.440 This is our staging site, and I will be using this for the demo. To give you an idea of the scope, New Relic RPM has approximately 1,500 customers who actively send us data using RPM to monitor their own Rails applications. All this information is collected by what we refer to as our collector tier, which is located at Engine Yard. It operates on a cluster, comprising eight slices at Engine Yard, running 24 total Mongo instances—three Mongo instances per slice.
00:09:01.720 You can observe some basic statistics on our dashboard, but the most interesting statistic is that, on a Saturday, we receive 19.34 requests per minute. This means we have 19,000 agents sending us packets of information every minute, which we then capture in our collector tier and write to our database, making it accessible through our user interface. On average, we manage this with a 38-second response time.
00:09:47.560 For those of you who are new to Rails, I assure you—Rails can scale. Others here can confirm that as well. In any case, with such a high volume of data, performance is critical. We must collect and manage all that information effectively. Let's dive into the collector for a short overview of the product, as many people may not be familiar with it.
00:10:24.640 The first page you encounter is an overview page where you can find various statistics, such as response time, throughput, and active record metrics. There is also something referred to as Apdex—does anyone here know what Apdex is? A couple of people. Well, Apdex is a systems management standard that has been in circulation for a few years.
00:11:00.000 It provides one number: a satisfactory response time for users of your application. By allowing users to establish tolerable response times, it also enables you to compare your application's performance against those expectations. This way, if a business person questions the app's speed, you can demonstrate that despite perceived slowness, the app is meeting its agreed-upon standards. Additionally, on this page, we show error information, CPU utilization, physical memory usage, as well as deployment lines.
00:11:56.000 For instance, I can share that we deployed around that time frame. We can also modify the viewing window. One of the most critical activities for our developers post-deployment is to monitor the rate of errors and identify error types that occur immediately after each deployment. This guidance is crucial for indicating problems in code that may have been introduced into the production environment.
00:12:38.560 Looking at the past 24 hours of errors, we are averaging about four errors per minute on our application based on our collector tier. We’ve also aggregated the error data over this 24-hour period. We have a customer, who we shall leave nameless, who previously handled error detection by sending an email for every single error, leading to a full inbox which they monitored. They would basically pay attention to the speed at which errors piled up.
00:13:13.679 What monitoring in RPM offers is more streamlined error aggregation capabilities—allowing you to track and investigate specific types of errors. We can access the Rails stack, see error specifics, share error details via email with team members, and integrate with services like Lighthouse for ticket creation, adding all critical information to newly generated tickets.
00:13:56.160 Another significant feature that developers spend considerable time on is transactions—particularly individual transactions. The dashboard displays aggregated information, which usually provides a good overview. But many times, applications might perform well overall while individual customer transactions take too long or encounter errors.
00:14:30.479 Let me check out some transactions now, it is taking a second to load. As this is processing, let me share that the transaction PR feature looks at your application in production and showcases the slowest transactions every minute. The transactions are analyzed, especially if certain HTTP requests take an incredibly long duration.
00:15:11.640 The reports display SQL for those transactions, particularly identifying where parts of the SQL may be slow. If there’s identified lag, for example, anything over 300 milliseconds, the SQL plan will be pulled as well. We have found this function invaluable in diagnosing specific application issues.
00:15:49.960 Now, let's switch back and look at the transactions from a different application that I know is working. This is a Shopify application that I was granted access to. We can use this as a live example to shed light on performance monitoring. As we analyze the transaction traces, we find that a considerable percentage of his transactions exceeded two seconds, with some significantly overshooting that time.
00:16:30.320 For instance, I see a transaction that took 656 seconds. Let's explore what might be causing this issue by examining the components. The transaction analysis shows everything from total time and exclusive time, where exclusive time means excluding the time Rails takes to execute. Next, we can explore the SQL tab for this particular transaction.
00:17:09.640 It details every SQL command that the application called during that transaction. Right away, we notice there's not much that could be optimized in terms of the individual transactions displayed here. However, we often encounter instances where queries were mis-indexed or mishandled joins, resulting in delayed execution time.
00:17:48.480 The monitoring will allow you to click through on any of these transactions that exceed 300 milliseconds and look for additional information on how those queries are running. As we look deeper, we will find occurrences where additional indexing or adjustments can improve performance.
00:18:32.760 Next up are deployments, which are critical for keeping your application functioning correctly. From feedback I've been given by our development team, deployments present the riskiest phase for introducing new code. For each deployment, we collect data from an hour before the deployment through three hours afterwards, which provides insight into the results.
00:19:16.080 In this case, we can see how a couple of recent deployments affected performance metrics. While tracking this, we notice distinctions in CPU and memory usage. We can further analyze each of these deployments to retain a comprehensive overview.
00:20:01.400 As we reflect on performance improvements over time, we note a clear increase in the throughput of requests per minute from December to now—a growth of at least 25%. Our memory utilization remained approximately the same, but enhanced throughput indicates successful optimization.
00:20:44.000 I will speed things up a little, as we are running out of time. I won’t go into the specifics of our scalability analysis, but it analyzes throughput against various metrics over time. By assessing this visual data, we get to identify areas in our database performance that require optimization.
00:21:27.360 Our developers utilize RPM constantly, sharing insights and notes with each other so they can address issues quickly and efficiently. Recently, we faced a problem where a well-designed widget for Mac OS X created unexpected behavior on our staging site. When it was installed by many New Relic employees, performance degradation was observed.
00:22:11.040 Using the note feature, we documented the timeline of our slow down on the 31st of March. This recorded action demonstrated a drastic increase in the number of transactions that were regarded as less than satisfactory on our Apdex scale.
00:22:54.520 This spike was indicative of an underlying issue within our staging environment, prompting our developers to look deeper into the recent deployments. With our monitoring in place, our developers could correlate the increase in response times to a faulty deployment that had triggered more frequent database queries.
00:23:36.799 Ultimately, they were able to isolate the problematic function in our code—the accounts controller index—identifying excessive queries driven by the widgets being deployed across user dashboards. By promptly shutting down those problematic queries, we instantaneously returned to stable performance metrics.
00:24:20.640 By observing these processes at work in New Relic, our team effectively resolved the issue, establishing actionable paths to maintain performance and reliability. This instance highlights the significance of having appropriate performance monitoring tools in place.
00:25:05.440 We have about 1,500 customers using our tools and find them beneficial in various ways. For those unfamiliar with our services, we offer a free version called RPM Light. Additionally, for today, we offer a promotion code for those who sign up for RPM Light, providing you 30 days of gold service to try all premium features.
00:25:50.960 If you would like to take advantage of the promotion, you can apply the code "LaRuby" when you sign-up. Finally, before concluding, I'll take any questions you might have.
00:26:35.680 Yes, one person asked where our users usually hang out. We maintain a support forum that users may refer to, hoping that it serves as a valuable resource during deployments. Additionally, we sponsor a website called RailsLab, where users can find podcasts and videos hosted by various knowledgeable individuals, enhancing their understanding of Rails scalability.
00:27:22.640 Among other useful sources, our site hosts vast information—available for free—without any registration. Our goal is to ensure developers successfully create high-performing Rails applications, benefiting everyone in the community. As for a question about our promotion for existing users, yes, if you have an account with RPM, you can upgrade to Gold through your account tab by simply entering the promotional code.
00:28:09.760 Regarding our pricing, we offer four levels: Light, Bronze, Silver, and Gold. Light is free across unlimited hosts while Bronze starts at $4 per host monthly. Silver costs $85 per host monthly and Gold is $200 per host monthly. There is volume pricing available, especially for those in a cloud environment.
00:28:52.560 Features differ among packages, with details on the window for data storage increasing with each tier. Some tools—like transaction tracing and deployment deep dives—are exclusive to higher-end packages. If you're intrigued about extending RPM service for a variety of applications beyond Rails, we are exploring that. Currently, our product focuses solely on Rails applications.
00:29:30.200 Finally, let me share that all the fantastic graphs and charts you see were created with an excellent charting package called FusionCharts. We've done a lot of builds in a short time, primarily with a small team of four developers, but it's been rewarding.
00:30:19.199 I'd like to extend my thanks for letting me substitute for Lou. I hope this presentation was valuable and enjoyable for everyone. It has been a lot of fun.
00:30:55.200 Thank you!