Kill! Kill! Die! Die! Load Testing With A Vengeance

Carlo Flores

Dan Yoder

4 talks

#continuous-integration

Kill! Kill! Die! Die! Load Testing With A Vengeance

by Dan Yoder and Carlo Flores

In the talk titled 'Kill! Kill! Die! Die! Load Testing With A Vengeance,' presented at LA RubyConf 2012 by Dan Yoder and Carlo Flores, the speakers delve into the complexities and importance of load testing for web applications. They begin by reflecting on the evolution of testing methodologies within the Ruby community, recognizing significant advancements in unit and functional testing. The essence of the talk is to highlight that load testing is as crucial as traditional testing methods, especially in understanding user experiences and application performance under stress.

Key Points Discussed:
- Importance of Load Testing: Load testing is critical for ensuring applications can handle usage limits and user expectations. The speakers emphasize that a slow application compromises user experience, rendering features ineffective.
- Integration with Continuous Integration (CI): Integrating load testing into the CI pipeline ensures that performance evaluations occur with each code commit, increasing reliability across environments.
- User Behavior Simulation: Emphasizing the need to model real user interactions, tools like HTTP Keeper and Auto Benchmark are discussed for their capabilities in generating meaningful performance metrics.
- Interpreting Data: The complexity of interpreting data outputs from load testing tools is acknowledged, with a focus on the insight that correlates to actual user journeys rather than just server statistics.
- Tools and Technologies: Various tools such as abchbs, Funkload, JMeter, and hosted solutions like Blitz.io are introduced, noting their respective strengths and weaknesses. Node.js is mentioned as a favorable choice for building load testing scenarios due to its robust HTTP core library.
- Collaborative Approach: The importance of a collaborative testing strategy is highlighted, where success criteria for load testing are defined by the entire team.
- Real-Time Testing: The architecture of their testing setup is described, utilizing a worker-based system with Redis for messaging, allowing for distributed load testing from global locations.
- Final Observations: The speakers conclude that adopting structured, methodical load testing practices is essential in modern software development, going beyond ad-hoc methods for improved performance results.

Overall, attendees are encouraged to make load testing a priority in their development processes to better understand and enhance application responsiveness and scalability.

00:00:00.080 Hello, everyone. Thank you for joining us.

00:00:22.560 A lot of you folks who have been in the Ruby community for a while have witnessed the evolution of testing methodologies and frameworks. I mean, five to seven years ago, testing kind of sucked compared to where we are today. We were still in the early stages of discovering dynamic languages and hadn't quite figured out how to compensate for the absence of static type checking. We realized that testing was something we needed to focus on.

00:00:34.000 Now, we have moved well past those early challenges and have entered the realm of testing things we previously didn't test. We are trying to model user behavior using frameworks like RSpec and Cucumber, which is fantastic. However, we must understand that the responsiveness of our applications is arguably just another feature. If you think about it in terms of user perception, a slow application might as well not have any features at all.

00:00:46.799 This brings us into the realm of load testing. What we hope to achieve here today is to build upon the contributions of many who have worked towards solutions for functional unit testing. We hope to make a small step forward in load testing, providing insight and perhaps offering an interesting demo if everything goes well.

00:01:06.320 Unit and functional tests in Ruby have greatly improved. We can all agree that load testing is just as critical as these tests. We must ensure our applications can handle their limits and truly care about user experience. Furthermore, this should be tied to continuous integration (CI). The only way we can gain confidence in our technology stack is to incorporate load testing as part of every commit that we make to our repository.

00:01:28.959 From an operations perspective, nothing is more frustrating than a developer explaining that everything worked correctly on their local environment. When troubleshooting, we often end up in conflicts over these matters. Fortunately, there are tools available for load testing, although not as many as those in the unit or functional testing community. One such tool is abchbs with machine guns, developed by the Chicago Tribune team. They spin up many small micro-instances to test a server.

00:01:54.560 You've likely encountered outputs from tools like Apache Benchmark or Siege, which present a lot of data and can be difficult to interpret. It's vital to note that while these tools give us success and error statistics, they don't provide meaningful measurements of user experience. It's essential to connect these insights to real user journeys rather than focusing solely on how Apache performs. Our goal should be to understand the user experience, which may involve database interactions or login procedures.

00:02:09.399 Another notable tool is HTTP Keeper, which is excellent for ramping up connections and observing where performance issues arise. While it provides decent outputs, it is essential to consider variability such as standard deviation. I don't personally care as much about the average user experience; I'm more concerned with how extreme scenarios affect user interactions.

00:02:24.880 Connections is a clear area where things could go wrong if there isn't control over the entire stack during testing. This is particularly true if you're trying to simulate realistic user behavior. For instance, HTTP libraries work well to avoid pooling connections where they reuse sockets. Typically, you'd want to keep a connection alive as long as possible, but when simulating independent users, they will not be sharing a socket unless in highly unusual circumstances. So it’s crucial to define connection behavior in your tests.

00:02:50.880 One of the more significant challenges can come from limitations posed by the testers. It's entirely feasible that my testing environment might fail before the server does. For instance, Auto Benchmark is a fantastic tool from Ibrahimovic that enables distributed testing across many machines, which allows us to overcome such limitations. Essentially, you can handle multiple testers on a single server without overwhelming it.

00:03:18.320 Another useful setup involves running HTTP tests across multiple instances to gather real-time data. I recommend using SSH loops to streamline your processes efficiently. When attempting to make sense of large amounts of data generated from these tests, it can be helpful to redirect and filter this information for easier analysis.

00:03:46.560 Some additional tools in this realm include Funkload and JMeter. While JMeter is solid, we chose not to pursue it due to the complexity of the configuration files and because it leans heavily on Java. Our philosophy leans towards using performance and responsiveness as core usability components of our applications. These tools should reflect the practices established in TDD with RSpec and Cucumber.

00:04:09.040 As we aim to distribute load effectively, we want to ensure that our load testing mirrors real user distribution. It’s essential to build the framework and methodology around user behavior testing so that we remain aligned with functional testing practices.

00:04:28.320 Next, we think of our load tests as not just performance benchmarks but as a fundamental aspect of the overall collaborative testing approach. By jointly defining our success criteria, we can ensure that the team builds a well-rounded strategy for quality testing throughout the development lifecycle.

00:04:49.760 While we were exploring these tools, we also noted the presence of hosted solutions for load testing, such as Blitz.io, which provide nice charts and response time metrics. The most critical data point we focus on is the maximum number of users a server can handle, which directly impacts scalability.

00:05:02.720 It’s astonishing that many people overlook this critical practice in the cloud. Monitoring user experiences without a performance goal in place can cause serious issues down the line. It's essential to ensure these metrics are part of every deployment strategy to derive actionable insights.

00:05:36.760 If we’re looking to improve performance and responsiveness across the board, we should be assessing the results of our load tests just as seriously as our unit tests. It’s time we change the narrative and make load testing a fundamental practice in software development, not an afterthought.

00:06:00.000 In our organization, we utilize Ruby within our stack but opted for Node.js for this load testing effort as it seemed more developed in this area. Node.js offers an excellent HTTP core library that we found essential for building out our testing scenarios efficiently.

00:06:31.680 Our implementation involves wrapping Node.js functionality within a utility we've created, enabling a REST-friendly experience when scripting tests. We focus on accurately representing user behavior in our load tests by using existing data and ensuring that the script mirrors realistic scenarios.

00:06:50.880 Also, as we develop the load testing scripts, we encapsulate user behavior into functions. This modular approach allows us to clearly define what each part of the test does, improving our flexibility and reliability in testing diverse user scenarios.

00:07:09.920 Carlo contributed to this process, and we took inspiration from our prior work to ensure a comprehensive approach to testing. Our goal is to run tests distributedly, reflecting the actual conditions and interactions that users would experience.

00:07:37.760 Today, we're running tests from several global locations. These pods operate independently without opening any ports due to their client-based architecture, allowing us to gather relevant data without issues. The controlling system, dubbed the Overlord, manages all the testing pods we deploy.

00:08:01.920 With this setup, I can run our tests from any location, including my Mac, without being limited by its hardware capabilities. We build a unified testing strategy by connecting our external pods to the Overlord for centralized management. Once we deploy the tests, the results are collected and aggregated in real-time.

00:08:27.040 Next, let’s discuss the architecture underlying our testing setup. While I won’t go into great detail, it’s crucial to understand that we employ a worker-based architecture supported by Redis as our message transport. This allows us to ensure fast handling of requests and efficient distribution of tasks to our worker nodes.

00:09:05.680 As we finalize tests, we want to assess data performance while simultaneously keeping everything manageable. When conducting tests in real environments, it's essential to monitor the performance metrics to determine potential bottlenecks that may appear under stress.

00:09:29.360 When we gather our results, we can evaluate how well the service performs in various scenarios while ensuring that the actual tests are reflective of the conditions our users will face. In-cloud environments provide unique challenges, so we need to be particularly proactive when testing our applications.

00:10:05.520 In conclusion, ad hoc testing isn't enough—structured and methodical approaches will yield better understanding and improved performance outcomes.

00:10:07.039 Let's make every effort to ensure that load testing is a priority in our software development processes.

LA RubyConf 2012