Practical Guide To Benchmarking Your Optimizations

by Anna Gluszak

In her talk at RubyConf 2018, Anna Gluszak addresses the essential topic of performance optimization in Ruby applications. Common misconceptions suggest that Ruby itself is slow; however, Anna clarifies that often it is a lack of proper code optimization that leads to performance issues. This presentation is particularly beneficial for developers without a computer science background, as it aims to demystify the process of performance enhancement.

Key points covered include:

- Understanding Performance Optimization: Many developers may not grasp the importance of code performance, leading to slow applications. Anna stresses that while minor performance differences may not always be noticeable to users, significant slowdowns can have detrimental effects.

- Benchmarking Code: Anna introduces the concept of benchmarking, which involves evaluating the performance of code by comparing old and new implementations. She emphasizes the use of the benchmarking module included in Ruby, explaining various methods such as BM, BM BM, and custom benchmarking techniques.

- Sampling Techniques: To illustrate performance benchmarking, Anna discusses different sampling methodologies:

- Simple random sampling

- Stratified sampling

- Cluster sampling

- Systematic random sampling

- Multi-stage sampling

These methods help in collecting representative data for assessing performance, especially in the context of an application estimating cat adoption times.

- Practical Examples: Anna uses a cat shelter application to demonstrate performance differences, with a focus on factors like age and color of the cats that affect adoption times. She shows that performance varies significantly based on how data is sampled and processed, particularly when nested loops and recursion are involved.

- Code Refactoring and Testing: The talk emphasizes the importance of solid testing prior to refactoring code. Anna advises engaging stakeholders for insights and ensuring any tweaks made improve performance without introducing bugs.

The main takeaway from Anna Gluszak's talk is that performance optimization is not just a complex theoretical concept, but a practical necessity that can substantially affect application fluidity and user experience. Developers are encouraged to benchmark frequently, especially before and after significant code changes, to ensure ongoing performance efficiency.

00:00:15.500 All right, well, I welcome everyone. My name is Anna, and I am currently a software engineer at Raleigh Health.

00:00:18.510 As I was introduced, I'm here to talk about performance optimization in Ruby applications.

00:00:23.430 Many people believe that Ruby applications are inherently slow, yet oftentimes it is the lack of optimization and not the language that is at fault.

00:00:27.260 But how do you even get started with this daunting task of performance optimization? For those who do not have a computer science background, understanding all the different ways of algorithm optimization can sound scary and overwhelming.

00:00:31.679 Some may have a good handle on the theory behind things like big O notation but struggle to put it into practice.

00:00:40.290 This talk will focus on a tangible and data-driven way to measure and optimize code performance.

00:00:44.370 When I was first becoming a developer, I would ask a lot of questions of people more senior than me.

00:00:49.050 I wanted to know more, so I would often ask things like, "Why did you do this instead of that?" They would say, "Because it's faster," and I would fire a bunch of questions at them.

00:00:55.170 "What do you mean by faster? Why is it faster? How is it faster? Can you show me how it's faster?" And they would just tell me that junior developers do not need to worry about the performance of their code; they should just go away.

00:01:03.420 So, to be fair, how many of you are junior developers? Okay, some.

00:01:10.080 It's not completely a myth because you could spend probably another year of your life optimizing the current application that you're working on, and you would still have some room for improvement.

00:01:17.220 Also, computers nowadays are pretty fast, so if your website takes, let's say, half a second to load as opposed to a quarter of a second, most of your users will not notice and will not care.

00:01:25.799 So why should you even worry about the performance of your code? Well, you don’t want to make it so slow that you break production. This actually happened to me when I was first learning to develop.

00:01:39.740 I put in one of my first PRs to fix a small bug in a calculation that ran every night as part of a large cron job. It was not optimized at all and took six hours to run overnight. I fixed the bug, the calculations were correct, but the next day it took 12 hours to run, and it was not the only cron job we had, which kind of broke production.

00:01:55.130 So I’m here to make sure that this does not happen to you. The moral of the story is not that you need to understand every single detail about code performance today.

00:02:02.570 Ruby is friendly for beginners; it's supposed to be easy to learn. You just need to be aware that there is such a thing as code performance, and sometimes it's important to make sure that you don't slow it down too much.

00:02:11.220 So how can you be aware of code performance? If you Google around, there are a lot of books you can read through, but many of them are very dry and difficult to get through.

00:02:19.480 A lot of them are great, and I don't want to discourage you from learning more theory if you want to, but there's just so much theory, and it's not that easy to learn all of it.

00:02:31.440 I'm not going to recommend any books today or ask you to read through all that dry theory. Instead, we're going to talk about a simple code example so that we can all think of the same concepts as we discuss different optimization methods.

00:02:40.380 We're going to talk about sampling because I have a statistics background, and I want to show its relevance. We're also going to discuss the benchmarking module, how to approach optimizations, and when you should benchmark your code.

00:02:52.880 Imagine you work for a cat shelter that has a bunch of different cats. Some are small, cute kittens, while others are older cats with health issues.

00:03:01.920 The shelter wants you to create an application to estimate how long it will take each of those cats to get adopted. You have some basic code.

00:03:07.740 Don’t look at it too closely; it’s not good code and not based on any actual cat statistics for adoption. For the purposes of this talk, each cat is a class, an object with different attributes that will affect the expected adoption time.

00:03:20.240 Each attribute will affect the expected adoption time either positively or negatively, and the effects will vary in scale.

00:03:26.290 For example, a loud cat might be less desirable than a cute cat. However, people would still prefer a cute cat that is loud over a non-cute, quiet cat.

00:03:35.500 In this application, there’s a method that determines how desirable a cat is. The more desirable attributes a cat has, the quicker it will be adopted.

00:03:46.180 Then, the business unit informs you that the time of year affects cat adoption rates. Specifically, around Christmas, people are more likely to adopt cats.

00:03:58.480 However, if you want to buy someone a cat as a gift, please don't do it unless you're 100% sure that they want one. This is how pets get abandoned.

00:04:04.660 And similarly, around Halloween, there seems to be a magic in the air, and a lot of black cats are adopted.

00:04:09.700 So, how fast is this code? To answer that, you could use the benchmarking module. Even if you have an older Ruby version, starting from 1.9.3, the module is included and very easy to use.

00:04:19.650 Now, what does it mean to benchmark code? The smart people always put in a Google definition, so I did that for you. It means to evaluate or check something by comparison with a standard.

00:04:34.920 In software terms, this essentially means taking the old implementation of the code and comparing it to a new implementation. You measure the time taken by each and check which one is quicker.

00:04:47.640 Hopefully, your new code is faster than the old code. When you have any function or method, there’s usually an input, a black box of your function, and an output.

00:04:55.490 Not all methods will have inputs directly; some may pull data from a database. If you're benchmarking, you want to run the code through various datasets, requiring you to create a sample that is somewhat representative of the population, which is where sampling comes in.

00:05:06.920 There are different sampling methodologies, and we're going to go through each one of them individually.

00:05:10.880 If all of this sounds weird to you, or maybe some of it seems familiar, let's break it down step by step. A simple random sample is like pulling random elements out of a hat.

00:05:32.010 If I collect all of your names, shake the hat, and pick five, that would be a simple random sample of people at this conference. Each person here would have an equal chance of being picked, eliminating الكثير من biases.

00:05:42.580 Next, we have stratified sampling. In this method, you divide the population based on certain attributes. In my cat application, for example, I could divide my cats by color.

00:06:02.460 If I have groups for black cats, white cats, orange cats, and grey cats, then selecting from each group would be a stratified sample. This method ensures that elements from each group are included if you know their attributes affect performance.

00:06:13.170 Within stratified sampling, you can do it proportionally or disproportionately. For example, if 30% of your cat population is black cats, you can select a sample of 100 cats with 30% being black.

00:06:23.970 Alternatively, you can have 10 colors and select 10 from each color, leading to equal representation from each group.

00:06:36.730 Next is cluster sampling, where you again divide your population into groups, referred to as clusters, and choose one specific cluster. For cats, if I choose only black and white cats, I ignore the rest of the population.

00:06:54.600 This is often a practical approach in the real world, especially when it's cheaper to sample this way. However, it can still be valuable in software, particularly if specific populations are of interest.

00:07:06.920 Then, we have systematic random sampling, which you might have already experienced in various settings. For example, if I ask for ten volunteers and count off the participants in the audience—1, 2, 1, 2—this is systematic sampling.

00:07:21.750 I arrange the population in a random order and select every nth participant. Lastly, the multi-stage sampling mixes and matches different techniques. You could do a stratified sample and then pick members randomly from each strata.

00:07:40.270 When benchmarking and picking your own data from a population, which method should you use? I typically use stratified sampling if I see specific variables impacting code performance.

00:07:54.790 Within each strata, I would do a simple random sample to account for unknown variables. If there's a particular attribute I'm concerned with, I may select only that cluster.

00:08:08.120 In our cat application, we can see two branches of code split by if statements based on color and date. Before picking a sample, we need to establish how large it should be.

00:08:20.700 In general, a larger sample size increases confidence, but if your population is small, you should consider picking the entire population.

00:08:28.250 In statistics, there’s a commonly cited number of 384, which gives a 94 to 95 percent confidence level assuming a large population. However, in benchmarking, you don’t always need to follow this precisely.

00:08:48.710 I usually select between 100 and 500 samples, often rounding to a clean number. For example, if I plan to have five different groups, I'd select 500 because that easily divides into five groups of 100.

00:09:05.680 The benchmarking module in Ruby offers four methods. The first is the benchmark method, which allows plenty of customization, including changing labels, captions, and formatting.

00:09:15.640 BM is similar but with fewer input options for output formatting. BM BM is particularly interesting; this one runs your code twice as a rehearsal to account for the magic Ruby does behind the scenes.

00:09:33.250 The first execution may be slower, which is obstructive if you’re trying to demonstrate which implementation is faster. Conducting a rehearsal run before measuring gives clearer results. Lastly, you just pass a block of code to measure its execution time.

00:09:54.200 Now, let's look at an example using the BM BM method. I divided my cat sample into two separate samples: one that includes all cats except black ones and one that exclusively focuses on black cats.

00:10:10.600 For each category, I run the expected base adoption time calculation. Thanks to the rehearsal block with BM BM, we can assess how these codes perform.

00:10:29.720 In the results, we find that the simple random sample excluding black cats took significantly longer than the one focused on just black cats, demonstrating how even such distinctions can greatly affect performance.

00:10:47.030 Benchmarking has four columns: user time for user-written code, system time for kernel operations running in the background, total time, which sums both, and real time, which measures total elapsed time.

00:11:08.780 Real time may differ from total time if your code involves waiting for user input. If you're using multithreading, observe how these times vary as well.

00:11:25.720 If you want to optimize your code, be mindful of loops. Each time you use methods like each or map, every element in the list is traversed individually.

00:11:39.960 Nested loops significantly increase processing time because they require each element to iterate through all others, creating O(n²) complexity.

00:11:50.760 Recursion, which can be efficient when cached, may also slow down your application if not implemented correctly. Be mindful of any required loading at the start of classes.

00:12:05.680 Callbacks and observers may also add extra execution time, so it's essential to identify these hidden culprits.

00:12:19.760 Ultimately, the method that was slower in my cat application involved recursive calculations, leading to increased runtime especially as the date approached Halloween.

00:12:36.890 If you're going to refactor code, a few things to ensure are solid tests; they prevent introducing bugs and validate that functionality remains intact.

00:12:49.800 Before you refactor code without tests, take the time to write them; ensure they pass before and after your changes, demonstrating the code still works and functions correctly.

00:13:04.150 After this, take a step back to think critically about how you could implement existing features better. Developers often become engrossed in implementing features without fully understanding their overall design.

00:13:16.230 This can lead to complex and spaghetti code that lacks performance. Engage with stakeholders, as they might better understand the intricacies of the problems.

00:13:28.870 For instance, in a recent interview, I encountered a challenge related to odd-numbered triangles. My initial solution involved constructing a triangle with an array of arrays.

00:13:51.400 However, the top solution was much simpler—simply raising the number to the third power. Understanding this nuance shows how sometimes, overcomplicating a problem leads to inefficient solutions.

00:14:09.920 The performance difference between my complex solution and the more straightforward approach was monumental.

00:14:27.180 If you’re tasked with refactoring a large class, focus on the methods that consume the most time, refactoring just a few methods may yield substantial performance improvements.

00:14:48.810 The act of measuring time can be as simple as recording start time and the end time of a method without needing elaborate benchmarks.

00:15:06.420 However, as you test and optimize code, make sure to run benchmarks to validate those improvements. Understand your tweaks contribute positively.

00:15:38.330 For any optimization, especially significant changes, validate it through consistent benchmarking and comprehensive data.

00:15:50.210 When should you benchmark your changes? Definitely for large cron jobs or anytime you’re making code changes that will run frequently or process large data batches.

00:16:05.430 Even small increases in execution time can compound massively when dealing with millions of datasets.

00:16:14.330 If your code is already slow, avoid causing further degradation. Engage with your colleagues to review and share suggested improvements.

00:16:28.360 Make comments on your work, explaining that a specific change may slow things down but is acceptable if the overall performance is still efficient.

00:16:41.900 So, with that, go forth and benchmark. If there are any questions, I'll stick around and am glad to talk to anybody.