Here are ten common statistics mistakes and errors. No advanced knowledge required to understand them!
1. Addition Rule for Probability
If the chance of contracting HIV from one exposure is 1 in 500, then if someone has been exposed to HIV 500 times, he will have HIV with 100% probability, correct? Right, and if he has 1000 exposures, he will have HIV with 200% probability. That was what one journalist implied with her article on HIV during the 1980's. Don't be her.
2. Precision and Accuracy
You're reading about Mt. Everest and it says, "8,850 m high (29,035.43 ft)." Pretty cool how Mt. Everest measured up so exactly in meters! Actually what you're seeing here is a high degree of precision given for a conversion of a rough approximation (low accuracy) of Everest's height. It's easy to avoid: never introduce significant figures that aren't there! You'll find the same thing with recipes giving temperatures in both Fahrenheit and Centigrade (you probably don't have to set your oven to exactly 392°F if it instructs you to bake at 200°C).
3. Check your Assumptions
Statistical tests come with assumptions about when they apply and when they don't apply. Many tests, such as ANOVA, are relatively robust and can provide useful information even when, for example, the underlying data aren't approximately normally distributed. It's tempting to just plug a big ol' dataset into SPSS and start running regressions and tests, but a useful tip (mostly for those just starting out) is to plot the data first, look for outliers, see if it's normal, etc. Another important point is to make sure you are aware of the (in)dependence of variables you're comparing.
4. Ignore Summaries: Find out Exactly What was Asked
News articles have a terrible habit of summarizing poll and survey results to that point that it is impossible to evaluate them. A good poll or survey will provide exactly what questions were asked (and variations such as switching order in comparisons). Answers given by respondents can vary greatly depending on how a question is phrased.
5. Improper Emphasis on Mean or Median
One community, which will remain nameless (only because I can't remember what it is) issued some PR to potential home buyers describing what a great place it was to live and emphasized the mean income of the population. When taxes were set to be raised, the community then acted outraged and pointed out the low median income. High mean, low median? With any distribution that is skewed right, as we say, the mean will be biased toward high outliers, by which the median is largely unaffected. Nonetheless, both are useful measures to have on a variable. You can avoid all of this by trying to get as much information as you can to get the whole picture.
6. Figuring out What Data is Relevant
When it comes to retailers, what's more important: their profit margin or their turnover? It happens that turnover is more important if you're interested in return on investment. Profit margins can be low but with a good rate of turnover, a very high ROI can be achieved.
7. Rigged Studies
There are so many ways to rig a study that it's important to rely on proper peer-reviewed research. One of my favorite examples is a story told by James Randi about the Duke University studies on ESP, where Dr. Joseph Rhine had filing cabinets full of experimental data that showed negative effects because the experiments "weren't working" on those days. So Dr. Rhine ignored them.
8. Misleading Graphs
The book How to Lie with Statistics by Darrell Huff is an excellent book, available at our store. A very easy, layperson's read on how data can be presented misleadingly. One chapter is on fishy graphs, which can be fudged in a number of ways. Two key ways are shifting baselines (so that the difference between compared values seems exaggerated by blowing up on the difference between the two), and rendering 2D and 3D graphs where comparison is made in one dimension (usually height) but magnified in our brains by a power of 2 or 3 because the objects have been expanded in every direction.
9. Correlation Does Not Equal Causation
Throughout the 1800's, consumption of alcohol in the United States increased with remarkable closeness to the increase in the number of clergymen. What's the connection? Population increase. A good rule to remember is that correlation suggests there is likely to be some kind of an "effect" or "relationship" between two variables, whether that relationship is direct or not or even coincidental requires careful thought and further investigation. Here's a good article demonstrating this. Money quote: "Fowler is quick to note that a study of this kind does not prove that diet soda causes obesity. More likely, she says, it shows that something linked to diet soda drinking is also linked to obesity." What could that be?
10. Forgetting Margin of Error
Finally, the last on our list of mistakes for consumers of statistics is failing to consider error bars on estimations of population parameters. A news article or a blog post immediately gains itself a measure of credibility if it discloses the margin of error on a poll or survey (and more if it mentions that the results could be reversed if that's the case).
Post new comment