00:00:17.840
All right, so I'm presenting today on 'Enough Statistics So That Zed Won't Yell At You.' My name is Devlin Daly.
00:00:24.660
I actually contacted Zed about this talk, and I asked him if I would be able to present enough information to prevent him from getting upset. He said probably not, but we could try, so that's what we're going to do today.
00:00:42.480
Some caveats: I am not a statistician. I'm actually into digital identity. If you want to talk about OpenID, OAuth, SRP, or permissions later on, I’d be happy to discuss that. I have taken a few classes in statistics, and Pat sort of roped me into this presentation; it's all Pat's fault.
00:01:02.399
Today, my goal is to familiarize you with some vocabulary and basic concepts related to statistics.
00:01:09.479
The basic concept behind statistics is that it emerged due to the need for models. Models should be familiar to us because software itself is a model. There are different mathematical models describing how planets orbit the sun in elliptical patterns.
00:01:21.780
While these equations can describe planetary motion accurately, the data does not always fit the model perfectly; there is often an error function involved. This error is what statistics studies.
00:01:41.280
For example, consider the temperature of the human body. This isn't just one fixed number; it varies. We typically say it is about 98.6 degrees Fahrenheit, but in reality, it is distributed in a manner described by the normal distribution, commonly represented by the bell curve.
00:02:03.299
In this distribution, 98.6 degrees would represent the average. When measured, temperatures will always be slightly above or below this average due to random fluctuations. In a normal distribution, we can determine how much data falls close to the mean.
00:02:31.020
For instance, when considering standard deviations around the mean, we expect about 68 percent of the data to fall within one standard deviation, and about 95 percent to fall within two standard deviations. This means not every data point will fit neatly within these ranges, but on average, we see these distributions hold.
00:02:57.360
When measuring something, we often encounter distributions that aren't perfect normal distributions, but they are still distributions. Distributions are characterized by their center—usually the mean—and by their spread, which is indicated by the standard deviation.
00:03:12.360
To further illustrate the relationship between the mean and the spread, consider two samples represented by blue and orange distributions. Both have the same average, but the orange distribution has greater spread in its data values. If we are looking at web server performance, we might misleadingly conclude that performance is the same because the averages are identical, yet the larger spread indicates potential problems.
00:03:39.060
Returning to the example concerning human body temperature, finding the true average requires measuring everyone in the world, which is practically impossible. Therefore, we must sample the population. A sample is a subset used to estimate the characteristics of the entire population. It's crucial that the sample is representative, or the data we gather can lead to false conclusions.
00:04:49.979
For example, if I were to sample only my family to estimate the average height of people, I'd misleadingly conclude that everyone is around six feet tall, since I'm taller than my family. To infer correctly about a population, we need a good representation of it, and that typically requires us to collect multiple samples.
00:05:21.900
One critical aspect of statistics is sampling—it’s essential. Single instances of tests or benchmarks yield singular data points; we need multiple samples to detect variance and establish a distribution. I work for Phil Windley, who is known for his love of Diet Coke.
00:06:02.039
He runs a conference where he ensures there’s plenty of Diet Coke available, much more than any other beverage. For example, if we wanted to find out if Phil could distinguish between Diet Coke and Coca-Cola Zero, we could design a simple experiment.
00:06:51.840
He could taste two cups without labels, and if he guesses right, there is a 50/50 chance of success. This method, however, is not accurate because it doesn't account for chance alone. To ensure validity, we need a larger sample size and randomization. If Phil guesses correctly more than 50% of the time, we'd determine he can indeed tell the difference.
00:07:50.400
Statistical testing, specifically the z-test, compares the proportion of correct guesses to what would happen if he were just guessing. This is how we apply logic and structure in statistical tests. We start with a null hypothesis, for instance, that Phil cannot tell the difference. The null hypothesis assumes a 50% probability of guessing correctly if he cannot tell the difference.
00:09:01.680
With statistical tests, we either reject or fail to reject the null hypothesis. If the tests show enough evidence, then we can say he might be able to distinguish between the two drinks.
00:09:59.880
An example of an application of statistical testing is A/B testing in conversion rates. If I run a website and change its design, I need to test if the changes actually improve customer conversion. We would need a representative sample of visitors to compare their conversion rates.
00:10:40.680
Issues might arise if the new site is underperforming due to a technical error, such as being incompatible with a specific browser. Statistical analysis would allow us to isolate these factors, ensuring that performance issues are accurately attributed to the web design and not user errors.
00:12:11.220
Let's switch to performance measurement for web applications. For instance, using a tool like HTTP Perf, we can measure how many requests our application can handle per second. We need to sample this population over time, pulling out multiple measurements to analyze average performance accurately.
00:13:12.540
When using a web server like Mongrel, we can start by measuring the performance per second. After making optimizations or changes to our application, we can establish if those changes improve performance using statistical methods like the t-test, which assesses whether observed changes in application performance are statistically significant.
00:14:35.640
The t-test evaluates two sample estimates of the same population, allowing us to determine if there is a statistically significant difference between them. When running performance benchmarks on different hardware configurations, we can identify if there is a true performance difference or just random variation.
00:15:38.640
This significance is expressed through a p-value, which indicates the probability that observed effects are due to randomness alone. Essentially, the t-test is an approximation of a permutation test, comparing the observed results to the entirety of randomized possibilities.
00:16:23.040
In practical scenarios, we can use statistical software, such as R, to perform these tests easily. For example, given samples from two groups, we can determine if they come from the same distribution using t-tests to inform our decisions.
00:17:31.500
While overlapping distributions may give us p-values indicating similarity, we need to assess whether this closeness is simply due to chance or captures genuine differences.
00:18:32.050
As we've discussed, confounding variables can obscure the results of statistical testing. To manage these, one should control experimental conditions by only changing one variable at a time.
00:19:49.200
It's critical that measurements are not affected by external factors. For performance tests, using separate machines for testing and production can minimize this risk.
00:20:34.320
Testing on a virtualized environment can also introduce variance due to shared resources. When using services like Amazon EC2, you must ensure adequate sampling to account for potential inconsistencies.
00:21:59.960
In summary, we have several statistical tools available to us, such as the z-test and t-test. To avoid mistakes in measurement interpretations, peer reviews and cross-checks are essential.
00:23:36.240
Resources like R can assist in statistical analysis, while good laboratory practices enhance the reliability and reproducibility of our tests. Engaging with the community and learning from others can also promote better understanding and application of statistical methods.
00:25:00.480
Lastly, discussions around the nature of measurements, like what defines a user in performance assessments, remind us of the importance of maintaining clarity about what we measure.
00:26:23.940
Sampling is essential; random sampling increases the likelihood of representative samples over repeated experiments. Additionally, the central limit theorem suggests that sample means tend toward a normal distribution with enough samples.
00:28:25.260
Determining the appropriate size for a sample relies on understanding the population dynamics and the required precision of results. Ensuring your statistical efforts are appropriately structured enhances their credibility.
00:29:24.540
In conclusion, understanding and correctly applying statistical principles is crucial in analyses, especially when working with complex systems. Causation cannot be assumed, and careful parameter management is needed for efficacious analysis.
00:30:38.340
The nuances of statistical methods, like Bayesian approaches, add depth to our understanding of data; they help define the origins of our observations within established probabilistic frameworks.