Statistical significance may be the most important concept you’ve never heard of. It takes more than knowing Mark Twain’s joke about statistics to know how to understand them. It’s easy to lie with statistics, after all. But it’s also not difficult to know when someone is lying with statistics.
Statistics work when you study a large enough segment of the population to be able to extrapolate with reasonable confidence of being accurate. Statistical significance is the key to measuring the quality of those statistics and the conclusions someone is making from them.
Key questions to ask about statistical significance
Any time someone sites a survey or a study, there are several key questions you have to ask in order to understand if you’re reading serious work, or if you’re reading propaganda. The reason Mark Twain ranked statistics as lower than lies and damned lies is because so few people knew in Twain’s day how to judge the quality of statistics. I don’t think the situation is much better now.
- How big is the population you’re dealing with?
- How many people did you study?
- What’s the margin of error?
- Are you 95 or 99% confident in the results?
Knowing the answer to some or all of these questions (ideally all), you can then know whether you have enough data to actually come to any conclusions. Without enough data you don’t have a study. You have a story that at best deserves the disclaimer “your mileage may vary.” At best.
An example from the news
Last week, a friend posted a story he read online from a reputable, mainstream, middle-of-the-road news outlet. The story was about a medical study that took place in France. The story stated that the study was inconclusive, because it studied 84 people before it was halted, and the difference between the people who responded well to the treatment was 1.8 percent higher than the people who did not respond well.
One of this person’s friends went through the roof. He saw the 1.8 percent difference and went on a rant about this news outlet, and journalists in general.
I do wish this news outlet had given a little more information. The math works, but most people aren’t going to go do the math. They’ll decide to agree or disagree based on what they want the outcome to be. The argument is stronger when you present a little more of the math that went into it.
In this case, I had to look up the size of the population. It was a little over 100,000. And with a sample size of 84, the margin of error was 11 percent. The margin of error is more than 5 times as much as the difference he latched onto.
We should also note the news outlet said it was inconclusive. It didn’t say the treatment didn’t work. And it didn’t say it did. It said we don’t have enough data to know.
Getting the margin of error to 1% would require studying a little over 8,700 patients. The problem was the side effects were too severe and happened too frequently for it to be safe to test 8,700 patients. If 8,700 people aren’t willing to volunteer, you can’t exactly continue the study.
How I would have written the story
When writing this particular story, I would have stated the margin of error in this case was 11 percent, and would have stated the study would need 8,700 patients to be statistically significant, or gotten a statement from a statistics professor at a university with a respected medical school stating how many would be necessary. Quoting a professor from Stanford sounds a little more legit than citing the online calculator from Survey Monkey, even though the number will be the same.
And frankly, I would have put all of this information in an infographic on the side so it’s easy to spot:
- 84 patients studied
- 11% margin of error
- 1.8% difference in outcomes
- 8,700 patients would have to be studied to draw a conclusion
What the margin of error means in statistical significance
The margin of error tells you what the range could be in the data you have. If I’m conducting a survey about an election and one candidate is winning by a 25% margin, then an 11% margin of error might be acceptable. The candidate could actually be winning by as little as 14%, or as much as 36%.
The difference has to be higher than your margin of error to come to any conclusion.
In the case of this medical study with a 1.8% observed difference, the real number is 1.8 percent plus or minus 11%. The real difference could be anywhere from negative 9.2 to 12.8 percent, and we literally have no idea whether the right answer is the treatment being 9.2% less effective than going without, 12.8% better, or anywhere in between, including zero.
If they’d been able to study 8,700 patients, then we could have said the treatment was between 0.8 percent and 2.8% more effective than doing nothing. And well, that’s not zero. It’s also not great. If a pitcher steps up to the plate in a baseball game and has a batting average of .018, how confident are you that he’s going to get a hit? Because that’s the same bet.
There are tons of margin of error calculators available online now, which is great The math behind calculating margin of error can be a bit error prone to do on paper. That’s why I got a C in college statistics. I understood the concepts, but wasn’t especially good at memorizing the equations and carrying them out on paper.
Uses outside of journalism
It’s important to note that journalists didn’t invent statistics. Statistics aren’t some kind of a politically motivated conspiracy. The same methods are widely used in business. When a company wants to introduce a new product, it uses statistical analysis to decide whether it is likely to succeed. This is one reason car companies build concept cars. If the reaction to the car is positive enough that they can sell enough of them to be profitable, the company builds the car. If it’s not, the car remains a one-off and the company tries something else.
This kind of analysis happens behinds the scenes, too. At one point in my career I was working for a Fortune 20 company, and the question arose whether they should buy computers from a different company. They were buying computers from Lenovo at the time, but had bought from Dell and HP in the past, and had significant numbers of each brand. When I looked at the failure rates and compared them to the overall population of systems, the difference was a rounding error. The percentage of Lenovos with problems was within .5% of the percentage of Lenovos in the entire population. And the same was true of Dell and HP as well. There might have been other reasons to change brands, but the failure were were observing was no more likely to happen in one brand than another.
Your favorite company does statistical analysis as part of its decision making progress. Or if it doesn’t, it needs to start, because I guarantee its competitors do.
Determining a statistically significant sample size
The next question is how many people do you have to test or survey to know if you have enough data? Use a statistically significant sample size calculator. To figure out how many people to test or survey, you need to know the confidence level (95 or 99%), confidence interval (also known as margin of error) and your total population.
You probably know the total population. Look it up if not. You should try for a 2 percent margin of error unless you need a smaller one.
In the case of the story I referenced above, with a population of 100,000 and a 1% margin of error, I get 8,763 if I accept a confidence level of 95%. To be 99% confident, which wouldn’t be a bad idea when lives are at stake, I need 14,267. Because if I’m making the decision and then someone dies, I might get the question in court why I was willing to only be 95% confident instead of 99.
If the question is whether a consumer product should come in a pink or a purple package, I’m fine with 95% confidence, because it takes longer and costs more money to test with 5,500 more people.
Generally speaking, unless we’re talking matters of life and death, with a 2% margin of error and a 95% confidence level and knowing your target population, you can get a very good idea how many people you need to study in order to make a decision. And the count is surprisingly low.
And I suppose this should go without saying, but who you ask can and does skew the results. The more random the sample, the more effective the math is at predicting the outcome. If I”m running for office and I use a donor list from my political party to survey about my chances for an election, the result is probably going to be different than if I call a random sampling of the local population. The donor list is probably OK to use before the primary. But after the primary, you have to use the general population.