|Nothing comes for free: if you can cope with 400 words on statistics, we can trash a front page news story together. "Cocaine floods the playground," roared the front page of the Times last Friday. "Use of the addictive drug by children doubles in a year."|
Doubles? Now that was odd, because the press release for this government survey said it found "almost no change in patterns of drug use, drinking or smoking since 2000". But the Telegraph ran with the story as well. So did the Mirror. Perhaps they had found the news themselves, buried in the report.
So I got the document. It's a survey of 9,000 children, aged 11 to 15, in 305 schools. The three-page summary said, again, there was no change in prevalence of drug use. I found the data tables, and for the question about using cocaine in the past year, 1% said yes in 2004, and 2% said yes in 2005. Except almost all the figures were 1%, or 2%. They'd all been rounded off. By asking around, I found that the actual figures were 1.4% for 2004 and 1.9% for 2005, not 1% and 2%. So it hadn't doubled. But if that alone was my story, this would be a pretty lame column, so read on.
What we now have is an increase of 0.5%: out of 9,000 kids, about 45 more kids saying "yes" to the question. Presented with a small increase like this, you have to think: is it statistically significant? Well, I did the maths, and the answer is yes, it is, in that you get a p-value of less than 0.05. What does that mean? Well, sometimes you might throw "heads" five times in a row, just by chance. Let's imagine that there was definitely no difference in cocaine use, the odds were even, but you took the same survey 100 times: you might get a difference like we have seen here just by chance, but less than five times out of your 100 surveys.
But this is an isolated figure. To "data mine", and take it out of its real world context, and say it is significant, is misleading. The statistical test for significance assumes that every data point, every child, is independent. But, of course, here the data is "clustered". They are not data, they are real children, in 305 schools. They hang out together, they copy each other, they buy drugs off each other, there are crazes, epidemics, group interactions.
The increase of 45 kids taking cocaine could have been three major epidemics of cocaine use in three schools, or mini-epidemics in a handful of schools. This makes our result less significant. The small increase of 0.5% was only significant because it came from a large sample of 9,000 data points - like 9,000 tosses of a coin - but if they're not independent data points, then you have to treat it, in some respects, like a smaller sample, and so the results become less significant. As statisticians would say, you must "correct for clustering".
Then there is a final problem with the data. In the report, there are dozens of data points reported: on solvents, smoking, ketamine, cannabis, and so on. Standard practice in research is to say we only accept a finding as significant if it has a p-value of 0.05 or less. But like we said, a p-value of 0.05 means that for every 100 comparisons you do, five will be positive by chance alone. From this report you could have done dozens of comparisons, and some of them would indeed have been positive, but by chance alone, and the cocaine figure could be one of those. This is why statisticians do a "correction for multiple comparisons", which is particularly brutal on the data, and often reduces the significance of findings dramatically, just like correcting for clustering can.
Mining is a dangerous profession - and data mining is just the same. This story went from an increase of 0.5%, that might be a gradual trend, but could well be an entirely chance finding, to being a front page story in the Times about cocaine use doubling. You might not trust the press release, but if you don't know your science, you take a big chance when you delve under the bonnet of a study to find a story.