00:00:00.000
I studied Economics, Interpreting, and Translation in university. However, after graduation, I worked as a dental nurse for two and a half years before realizing that it wasn't the career I wanted.
00:00:05.730
I decided to change my career path, so I taught myself the necessary coding skills to become a QA engineer. Later, with the mentorship of other software engineers at Airtasker, I gained the skills to join their ranks.
00:00:19.859
In addition to Ruby, I have a deep love for the Lord of the Rings; it’s my favorite movie! I also enjoy making fondant cakes, which my coworkers say are amazing.
00:00:45.440
Hello, everyone! I’m Nancy, a software engineer from Airtasker. Today, I'm going to walk you through our journey of performance testing using Ruby.
00:00:51.290
First, I want to start with a question. Imagine one day your CEO is on a call discussing your application doing some public relations for your company. How many of you would be confident to say that you know whether our platform can handle a huge spike in traffic? If it can't, do you know where the breaking point would be? Raise your hand if you know.
00:01:13.460
No one? Let's move to a more realistic question. When your business thinks we’ve done well enough in Australia, how many of you really know whether your current infrastructure can support expansion into three more countries over the next 12 months? Still no? That’s exactly the point.
00:01:35.780
For most of those who don’t know the answer and are looking for some direction, I’m here today to illuminate the dark! I’ll walk you through the journey of our performance testing, from design to implementation, and finally share our learnings. Let’s start with the ‘whys’—why do we want to conduct performance testing? We aim to find the invisible boundary lines: there is a point we know when we cross it leads to system downtime, but we often don’t know exactly where that point is.
00:02:15.319
Engineers may understand the limitations of their specific areas, but they might not have the complete picture. We want to grasp the limitations of our current infrastructure to facilitate expansion, and we also need to understand how our platform behaves under extreme circumstances. It’s crucial to note that during a high sales period, performance degradation may not be gradual—it often leads to unforeseen behaviors. We want to be proactive rather than reactive.
00:03:08.549
Let me share an example. The company I work for, Airtasker, is a marketplace where people can complete various everyday tasks. Two years ago, we launched a marketing campaign, offering $500 for people to play with puppies. Naturally, this led to an influx of users, and we had 1,700 people flood onto our platform for that one task. When we started celebrating the campaign, things began to break down.
00:03:46.799
Alerts started to scream, and we were at an off-site company gathering. People were saying, 'Put down your beers; let’s put out this fire!' This situation happens more frequently, and it can damage our brand reputation. Therefore, we aim to prevent such incidents by being proactive.
00:04:18.989
So, what's the process for performance testing? The first step is to determine performance criteria. Here, we gather enough information to ask the right questions and identify what types of performance testing we need to carry out.
00:04:40.050
The second step is to configure a test environment. We must create a test environment that simulates our production environment, ensuring we have all the necessary tools in place. The third step is to plan and design, where we analyze how real users behave and design our test scenarios and cases accordingly.
00:05:26.320
In the implementation phase, we build the testing framework, set up monitoring tools, create test cases, and prepare test data. The fifth step involves executing and monitoring the tests, after which we analyze the results, fine-tune our application, and rerun tests for comparison.
00:06:04.069
Let’s discuss the first phase: determining criteria. We need to understand the types of performance testing available so we can select the appropriate ones based on our criteria.
00:06:17.360
Load testing, for instance, tests the system’s behavior under anticipated load. Stress testing helps to find the breaking point by gradually increasing the number of users until the system fails. Spike testing analyzes system behavior during sudden increases in traffic.
00:06:44.810
Volume testing boosts your dataset to observe its impact on response time, similar to endurance testing, which examines memory consumption over a longer time. For a clearer understanding, we can compare testing time and the number of virtual users.
00:07:10.469
As you can see, load testing considers a specified number of users, while endurance testing involves a longer duration to surface issues like memory leaks. Stress testing is when we keep increasing the user count until we reach our limits.
00:07:47.660
Next, we’ll discuss how to formulate the right questions for establishing performance test criteria. Often, people ask generic questions like: 'What’s our average response time?' or 'How much traffic can we handle?' Unfortunately, these questions typically lack business value.
00:08:26.350
Instead, we should ask detailed questions tailored to our context. For instance, given regular traffic patterns, at what point does our performance become unacceptable? How much headroom does our infrastructure offer? If we double or triple our traffic tomorrow, can our application manage it?
00:09:02.190
To answer such questions, we would conduct stress tests while monitoring key performance indicators, such as ensuring the error rate does not exceed 0.5% at a given level of traffic.
00:09:46.640
On a personal note, I’ve been stress-testing my public speaking skills—from presenting five minutes in front of 15 people, to 10-minute talks for 40 engineers, to speaking today before an audience of 400 to 500 people for 35 minutes. Will I pass out from nervousness? Let’s find out in 30 minutes!
00:10:11.860
Returning to performance testing: if a major marketing campaign is on the horizon, how quickly can we scale to meet the anticipated demand? This is a perfect example of a situation requiring spike testing.
00:10:38.580
Drawing from historical data allows us to forecast potential traffic increases. For example, if we expect 300% more users during a marketing event, we can set criteria like ensuring that node and deployment replicas can scale to meet this surge within a few minutes.
00:11:02.360
Now, let's talk about the test environment. We want it to be as close as possible to production so that test results are reflective of reality. It should be isolated as we intentionally put it under pressure.
00:11:35.290
To achieve this, we run our tests using a duplicate image of our production instance on a different Kubernetes cluster. We replicate configurations and utilize tools like Terraform for resource management.
00:12:11.940
In terms of data, using a smaller test dataset can sometimes hide slow queries, so we create a snapshot of our production database and anonymize it for testing.
00:12:44.400
Keeping our testing environment cost-effective is also crucial. We have scripts that enable us to spin up and shut down our testing environment efficiently.
00:13:05.340
Regarding our selection criteria for performance testing tools, we want them to properly simulate production-like traffic. We prefer industry-standard tools that are well-maintained and possess strong community support.
00:13:36.950
We aim for tools that are easy to create and maintain tests with minimal learning curve. It's vital that all team members can operate these tools, and keeping a history of test runs for comparison is essential.
00:14:08.260
Visualizing results greatly aids in analysis. We also intend to run our performance tests regularly, so integration with CI/CD tools is critical.
00:14:38.560
Various performance testing tools are available in the market today, with some being open-source and others commercial. For smaller tests, tools like JMeter may suffice, but they come with limitations, such as the maximum number of virtual users.
00:15:11.690
To run larger-scale tests, we often select commercial tools built on top of open-source ones, which can support a greater number of virtual users while offering detailed reporting dashboards.
00:15:51.870
In our case, we chose JMeter due to its large user base and excellent dashboard capabilities. We also selected Flatirons, which uses Ruby JMeter—an open-source DSL for building JMeter test plans, enabling us to structure our tests in Ruby rather than XML.
00:16:21.370
Moving on to the planning and design stage, we have several best practices to follow. Firstly, it’s important to ramp up users gradually rather than launching everyone at once, as this can create artificial bottlenecks in the system.
00:16:52.460
Similarly, we avoid starting from a zero-load situation, as this doesn’t reflect real-world conditions. We want to consider different clients and browsers, as traffic will come from various sources.
00:17:24.280
Additionally, we should distribute user load to simulate a more realistic traffic profile. Not every user will interact with our application the same way at any given time.
00:17:57.480
We can analyze traffic profiles by measuring HTTP requests and focusing on endpoints that are vital to our application.
00:18:23.660
Next, I'll illustrate how this implementation allows our various teams to leverage performance testing. I’ll show you a script that spins up the test environment.
00:18:51.020
First, if the command is 'up', we navigate to the Terraform repository and provision the resources according to our Terraform plans.
00:19:18.890
We create our deployment by applying all Kubernetes configurations and, once done, we destruct our deployment to clean up all resources.
00:19:45.080
Now, I want to provide an example of what test data looks like using Swagger. For test cases, we define attributes such as name, category, status, and selfies.
00:20:13.600
To keep our reporting organized, we create helper methods used across our test cases. For instance, we set default hosts and request headers, including content types and authentication tokens.
00:20:42.660
Next, I’ll showcase how we create test cases using Ruby JMeter. To begin, we specify the number of virtual users, ramp-up time, and duration of the test.
00:21:12.460
We also define the HTTP method and endpoint for our requests. If necessary, we may extract data from responses for further use.
00:21:42.130
In this setup, we can control how traffic is divided. For example, we can allocate 50% of traffic to one endpoint and later use the ID extracted from responses in subsequent requests.
00:22:11.700
In our custom defined controller, we can standardize data import using the Ruby gem. This also allows for diverse assertions, such as checking response codes and durations.
00:22:50.900
Next, we’ll explore how to run commands to execute tests using command-line options. We leverage the OptionParser class to set parameters like test duration.
00:23:22.730
The command processes inputs and executes tests accordingly, either creating test plans or running tests based on conditions.
00:24:10.630
Now, I'll present a demo of how the command functions. Although I won’t conduct a live demo due to concerns over potential issues, I can show you the expected outcomes.
00:24:48.270
Upon running the command, you can see generated test plans that include user ramp-up times and configurations. This gives you an idea of how the tests will execute.
00:25:29.420
We can visualize results as the test runs in non-GUI mode, with logs indicating success and failure rates.
00:25:53.590
To see real-time data, we can use an open-source tool called Teraform created by BlazeMeter, which provides live reports and metrics during the tests.
00:26:40.940
After running tests, we can view results through dashboards showing success rates and response times.
00:27:12.600
Next, I’ll show you how to run the test plan using the JMeter dashboard. We’ll import the test plan and see how all the configurations take shape.
00:27:47.670
As the tests run, we retrieve request and response data for various endpoints, alongside a statistical overview of testing metrics.
00:28:15.720
After all tests, we analyze our Application Performance Monitoring results to validate whether our performance criteria match the set benchmarks.
00:28:47.590
We examine results, particularly spikes in errors and response times, to identify problematic endpoints with longer queries.
00:29:13.860
After identifying issues, we address them promptly. Now, I’ll introduce our three-dimensional scaling model.
00:29:43.740
The y-axis represents splitting application components for independent management. The x-axis signifies scaling by increasing database replicas. The z-axis relates to categorizing traffic effectively.
00:30:28.850
We strive to distribute traffic strategically and fine-tune resource allocations—considering AWS instance types and memory. Additionally, we enhance our codebase to better handle load.
00:31:07.330
To summarize our learnings, baseline tests should be conducted first to determine benchmarks, and we must recognize test differences between production and testing environments.
00:31:45.420
We also need to understand how to ask pertinent questions to ensure performance tests meet stakeholder needs and to remain aware of external dependencies that might affect outcomes.
00:32:10.030
Expect to face bottlenecks in unexpected areas, and for those interested in further information, I invite you to visit the Ruby JMeter GitHub repository and my performance test generator repository.
00:32:46.640
Reflecting on our tests with NASA's public API, we observed that while our error rate and memory usage remained normal, CPU usage spiked occasionally.
00:32:56.460
Fortunately, all key functionalities operated as expected, leading to a positive overall assessment. Thank you!