RailsConf 2015
A New Kind of Analytics: Actionable Performance Analysis

A New Kind of Analytics: Actionable Performance Analysis

by Paola Moretto

In her presentation at RailsConf 2015, Paola Moretto discusses the critical importance of application performance in the modern web environment. She emphasizes that speed and responsiveness are not just beneficial but essential for user satisfaction and business success. ‘Speed is product feature number one,’ as quoted from Larry Page, underlines the significance of performance in web applications. Moretto argues that poor performance not only harms user experience but can also adversely impact visibility, SEO rankings, conversion rates, and overall brand perception.

Key points discussed include:
- Two Types of Data: Moretto differentiates between live traffic data (monitoring) and synthetic traffic data (performance testing) as vital sources for understanding application performance.
- Monitoring Strategies: She outlines various monitoring methods such as stack, infrastructure, and user behavior monitoring, highlighting the need to instrument systems thoroughly before drawing conclusions.
- Limitations of Monitoring: Noting that monitoring acts ‘after the fact’, she compares it to calling for help after an accident, stressing the importance of performance testing to anticipate potential issues.
- Performance Testing: This process simulates user scenarios in controlled environments, allowing developers to troubleshoot effectively and measure true user experience metrics beyond server stats.
- Continuous Testing: Moretto advocates for continuous performance measurement to adapt to changes in both internal code and external environments, ensuring that applications remain responsive.
- Actionable Analytics: She stresses the need for predictive analytics using machine learning techniques and data mining to proactively identify bottlenecks and improve performance health. Such methods enable developers to spot issues before they affect users.

Moretto illustrates her points with anecdotes, including a case where an unnoticed change by a cloud provider led to significant application performance degradation, showcasing the necessity for continuous monitoring and testing. In conclusion, she reiterates that while monitoring is foundational, combining it with performance testing fosters a more resilient application capable of predicting and addressing performance challenges. Ultimately, the focus on speed as a core feature ensures efficient deployments and better user experiences, making performance analytics essential in the development process.

00:00:11.719 Hello everybody, I'm Paola Moretto, co-founder of a company called Nuvola. You can find me on Twitter at polymerase. A little bit about me: I'm a developer turned entrepreneur and have been in the high-tech industry for a long time. I love solving hard technical problems. I originally come from Italy, but I've been in the US for 20 years. If you don't find me writing code, I'm usually outdoors hiking.
00:00:46.079 This presentation is about performance. We heard it loud and clear here at RailsConf: faster is better. We all know what performance is, but it's important to understand the real impact of low performance. When I talk about performance, I mean speed and responsiveness—the speed and response times that your application delivers to users.
00:01:03.780 There's a famous quote from Larry Page that states, 'Speed is product feature number one.' Therefore, you need to focus not only on your functional requirements but also on the non-functional ones. Speed is paramount for any web application today. There's plenty of research and data that backs this up, showing the impact of low performance. It affects visibility, SEO ranking, conversion rates, brand perception, brand loyalty, brand advocacy, and can drive up costs and resource usage. Low performance typically leads to over-provisioning, which is not usually the right answer.
00:01:43.800 In today's web application environment, speed is critical. If you have a DevOps model, where development, QA, and ops are integrated, the need for speed becomes even more crucial. In a cloud environment with programmable and elastic infrastructure, where you're adopting continuous delivery and agile methodologies, it is vital to ensure that every build is not only functional but also performs at the right speed.
00:03:04.900 So, how do we tackle the issue of performance? The first thing you need is data. This is a quote I borrowed with pride from a talk mention yesterday: 'In God We Trust, all others bring data.' This leads to the problem where you deploy and hope for the best, relying on users to act as your QA department. I've heard of companies, like certain e-commerce applications, stating that they know when they experience slow down because users complain on Facebook. Normally, that's not the best approach. You need substantial amounts of data.
00:03:30.370 There are two types of data to consider. On the right-hand side, you have your live traffic data when you deploy your applications. This usually falls under the umbrella of monitoring. Many times, this encompasses various monitoring data and techniques. On the left-hand side, you have your testing environment, which typically includes pre-production or staging environments. During this phase, you create synthetic traffic by simulating user activity and according to this process, you conduct performance testing.
00:04:13.510 Let's start with monitoring. You have several types of monitoring: stack monitoring, infrastructure monitoring, user behavior monitoring, and what are the most common user scenarios. You may also encounter streaming analytics or high-frequency metrics, where solutions extract data from the platform with speed. There are many existing solutions, and while we are not affiliated with any of them, we want to communicate the spectrum of monitoring and data instrumentation solutions available. They complement each other as there's no one-size-fits-all, as it depends on your application's needs.
00:05:12.750 The primary issue you face today is that despite having dashboards filled with data, correlating it to understand what's truly happening can be challenging. Therefore, monitoring is your first line of defense; instrument your system before asking questions. However, monitoring alone isn't enough. Life monitoring can be noisy, and the challenge of troubleshooting specific scenarios becomes pronounced when other users are performing different actions & the system might behave unexpectedly.
00:06:18.919 Another problem with monitoring is that it happens after the fact; it doesn’t help you predict or prevent potential issues that your application may encounter. An analogy would be that monitoring is like calling Triple-A after an accident—it’s beneficial, but you’d prefer to avoid the accident in the first place. Therefore, performance testing complements monitoring very well.
00:07:27.150 With performance testing, you can create synthetic traffic to simulate user scenarios in a controlled environment, typically in pre-production. When troubleshooting, you can reproduce specific scenarios conveniently. This controlled setup allows you to peel back layers of issues, facilitating a more straightforward troubleshooting process—with controlled variables: user traffic and user behavior.
00:08:57.920 Performance testing provides end-to-end user metrics, measuring the actual experience of your users. It’s crucial to focus not solely on server metrics or application and database metrics, but to understand the true end-to-end performance. Companies have often seen a significant disparity between user-facing metrics and server metrics, highlighting the need to measure the end-user experience.
00:10:24.990 Another important aspect involves measuring and optimizing before issues arise. Before deploying, you should have tested realistic scenarios and launched your metrics to measure the KPI for end-user experience. Different types of metrics such as response time, transaction completion time, throughput, and error rates come into play here. This ensures that you can identify and resolve any issues before they affect the user.
00:11:00.600 Software is continually changing, and determining whether specific changes affect user interactions is critical. The performance of an application can degrade, not only because of own code changes but also due to external factors - the vast cloud infrastructure and ongoing modifications from external parties can introduce performance bottlenecks. Therefore, continuous testing is necessary to ensure everything is running optimally.
00:12:54.520 For example, a change in the routing system at a cloud provider wasn’t publicized, leading to significant application impacts but went unnoticed until users started reporting issues. Identifying these issues relies heavily on continuous measurement of application performance against ever-changing external factors.
00:14:06.970 A huge challenge in performance troubleshooting is that engineers often spend a lot of time reproducing the initial problem and isolating the issue. This process can be incredibly time-consuming and labor-intensive, but it is vital to identify what is affecting performance. It turns out that with the right data and testing setups, identifying the source of the problem can be much easier.
00:15:17.370 At this stage, you want to minimize uncertainty and provide actionable data. You want predictive analytics capabilities integrated with monitoring and performance testing, enriched with data instrumentation to help localize where the performance issues lie. This predictive analysis accelerates the troubleshooting process, making it easier to resolve issues efficiently.
00:16:22.120 Our goal is to have clear insights into performance issues before they impact actual users, leveraging metrics collected during performance testing to anticipate problems even before they happen. By analyzing data, you can uncover where the bottlenecks reside, allowing developers to act swiftly and effectively.
00:17:47.760 We tap into various strategies, including data mining and machine learning techniques, to help localize performance issues by recognizing patterns within application metrics. The aim is to build a systematic approach to understanding which elements of an application contribute most significantly to performance problems.
00:19:06.690 For instance, when performance tests show significant delays when applying a linear ramp of traffic on a live application, the next step is data analysis via instrumentation in real time. Here, the focus shifts to identifying which aspects of the application are behaving irregularly under specific conditions or user loads. By analyzing historical data alongside live testing results, we can highlight which parts of the application see a divergence, indicating performance bottlenecks.
00:20:52.240 This system approach is both systematic and integrative. We demonstrate how applying these techniques can give immediate visibility into application performance. Understanding these points allows for proactive corrections and remediation to improve overall performance health, resulting in an efficient user experience.
00:22:50.569 In summary, speed is the foremost feature of product performance. Ensuring that your application operates efficiently is crucial to a successful deployment. Monitoring serves as the first defense line, but when coupled with performance testing, we create a more robust system capable of predicting and correcting potential issues before they significantly affect users.
00:28:02.320 Thank you for your time today. If you have any questions or feedback, feel free to reach out to me on Twitter at polymer 803.