A bluffers guide to evaluating scientific results, Part 2: Rules of Thumb
In Part 1 of this series, we examined techniques for gauging systematic uncertainties. Now, I'll offer some rules of thumb for distinguishing strong results from weak. Yes, there's math, but it's simple arithmetic that you can do on a napkin (or your phone). Other tidbits are listed in tables.
Keep in mind that random processes can fluctuate in such a way as to mimic a signal when no signal exists. A significant result is one that could only be mimicked by an extraordinarily low probability random fluctuation. Our goal is to come up with a simple technique for estimating those probabilities as we read science articles from our Facebook feeds.
Rules of thumb
Let's start with a hypothetical example: Is the queen of England more popular than I am? I agree, common sense indicates that she is, but is her popularity statistically significant?
Figure 1. Popularity contest, Ransom vs the Queen. Photo source: Ransom's daughter and The Daily Mail.
If the Crown announced one morning that anyone could pop in for a cup of tea with the Queen, how many people would queue up at Windsor Castle? Let's say that 1,000,000 people show up (it's short notice after all). Similarly, I announce one morning that anyone is welcome to come over to Casa Ransom and have a nice hoppy ale. It's reasonable to expect that, perhaps, four people will come over. Five if my mother is in town (she loves that west coast hop water).
Rule of thumb 1: The statistical uncertainty, let's call it sigma (σ), of a sample of size N is given by the square root of N, .
Statistical uncertainty indicates how random processes would cause an experiment to vary if it were carried out many times.
Table 1 and Figure 2 show that 32% of the time (call it 30%), random processes will cause the same measurement to shift by more than one sigma away from the measured value. In other words (describing statistical principles in text is notoriously annoying) if you make the same measurement 100 times, expect 30 of them to differ from your result by more than one unit of uncertainty. Similarly, 5% of measurements will differ by more than two standard errors, about 0.3% by more than three sigma, and so on.
Table 1. The percent of measurements in the right column will diverge from the measured result by at least the number of standard errors in the left column. (See this PDF from the Particle Data Group for a complete bluffers guide to stats.)
Figure 2: Random distribution by number of standard deviations, σ.
Now, back to the Queen and me. If a million people show up at Windsor Castle, then statistics says—in a broad, rule-of-thumb sense—that the uncertainty in that number is = 1000. Using Table 1, if we make the offer some other day, she'll have more than 1,001,000 people or less than 999,000 on 32% of those days; more than 1,002,000 or less than 998,000 on 5% of those days; more than 1,003,000 or less than 997,000 on less than but about half a percent of those days, and so on.
Meanwhile, the statistical uncertainty in the number of people who visit me is = 2 (unless mom's in town). The number of visitors I should expect is 4 ±2, which means that about 30% of the time, I'll get fewer than two (i.e., 1) or more than six visitors and about 5% of the time either no one comes over or more than eight people show up. My 4 ±2 means I have a 50% statistical uncertainty (again, in the rule-of-thumb sense, not the carefully calculated sense, which would change the numbers but not the conclusion) compared to the Queen's 0.1%.
Because it's safe to assume that I'm of average popularity (yes, I flatter myself), then I can attract four people over for a beer. That is, four is the background noise level.
To get the statistical significance of the queen’s popularity, we take the number of her visitors, subtract the background to get the signal, Nsignal, and divide the signal by the uncertainty in the background, .