In today’s National Post, there is an interesting piece by Joseph Brean on the subject of spurious statistical correlations in science. It touches briefly on the decline effect, but not in a way that I find entirely encouraging.
Brean cites several papers, including one by Young and Karr on problems with irreproducible results, in the journal Significance, and another from last fall entitled ‘False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant’.
It’s provocative stuff. Brean quotes both groups of researchers as saying that there are simply too many ways to manipulate and mine the data to create the appearance of correlation, or a trend. One group says “modern academic psychologists have so much flexibility with numbers that they can literally prove anything.” Researchers might be engaging in hype, or fraud, or just self-delusion, but we wind up at the same (irreproducible) result regardless.
Brean also cites from an e-mail he got, from a blogger named Victor Ivrii. (See the blog here.) Ivrii is similarly scathing about the lack of proper controls on papers submitted with statistical arguments.
It’s an open secret in science, as Jonathan Schooler found some years ago when he couldn’t reproduce his celebrated memory experiment. Colleagues told him that even trying to reproduce a result is just a recipe for heartbreak.
As I say, Brean offers some provocative complaints about the lack of reproducibility, from a variety of respectable sources. However, the solutions that are being proposed are pretty conventional. So far everyone sees it as a problem with self-discipline and professionalism among scientists. They just need to be more rigorous and results will be reproducible. And I don’t dispute this perspective, I think scientists DO need to be more rigorous. I just think it’s not going to be enough, because decline is a real phenomenon.
For example, Young and Karr want to see papers submitted with two sets of data, one to be released with the paper, the other set as a ‘holdout’ to be analyzed later. However, this would involve a vastly larger effort by journals to validate the papers they are sent. Checking the results twice for each paper would be an exponentially greater amount of work than the present widespread policy of looking over the paper in a general way, and not checking the data at all. I can’t fault their logic in asking for this, but it won’t happen, except in rare cases. There is simply no way that reviewers can invest that much time in other people’s papers.
More disturbing, as I have said before, are the cases where the work was done rigorously, and the results were still not reproducible. It is my impression that the big pharmaceutical companies are not abandoning the search for antidepressants merely because of some bad statistical controls and data mining. They’re abandoning the field because even when everything is done well, they can’t get a workable drug.
Brean brings up the decline effect briefly near the end of the article, and cites Young and Karr as saying that it doesn’t really exist, it’s just a symptom of poor experimental controls:
‘They point out that the example used in the magazine — the “decline effect,” seen in studies of paranormal extra-sensory perception, in which an initially high success rate drops off steeply, later explained by the statistical concept of regression to the mean — was simply “wrong and therefore should not be expected to replicate.”
The bad news: Showing the decline effect is real is going to mean overcoming this argument, and it is a very popular and widespread argument.
The good news: Given how lousy reproducibility is in the social sciences, there is plenty of incentive for people to listen to a new perspective.