Correlation, Causation, and the Weight of Evidence

Wednesday, October 20. 2010

Correlation, Causation, and the Weight of Evidence

SUMMARY: One can only speculate on the reasons why some might still wish to cling to the self-selection bias hypothesis in the face of all the evidence to date. It seems almost a matter of common sense that making articles more accessible to users also makes them more usable and citable -- especially in a world where most researchers are familiar with the frustration of arriving at a link to an article that they would like to read (but their institution does not subscribe), so they are asked to drop it into the shopping cart and pay $30 at the check-out counter. The straightforward causal relationship is the default hypothesis, based on both plausibility and the cumulative weight of the evidence. Hence the burden of providing counter-evidence to refute it is now on the advocates of the alternative.

Jennifer Howard ("Is there an Open-Access Advantage?," Chronicle of Higher Education, October 19 2010) seems to have missed the point of our article. It is undisputed that study after study has found that Open Access (OA) is correlated with higher probability of citation. The question our study addressed was whether making an article OA causes the higher probability of citation, or the higher probability causes the article to be made OA.

The latter is the "author self-selection bias" hypothesis, according to which the only reason OA articles are cited more is that authors do not make all articles OA: only the better ones, the ones that are also more likely to be cited.

The Davis et al study tested this by making articles -- 247 articles, from 11 biology journals -- OA randomly, instead of letting the authors choose whether or not to do it, self-selectively, and they found no increased citation for the OA articles one year after publication (although they did find increased downloads).

But almost no one finds that OA articles are cited more a year after publication. The OA citation advantage only becomes statistically detectable after citations have accumulated for 2-3 years.

Even more important, Davis et al. did not test the obvious and essential control condition in their randomized OA experiment: They did not test whether there was a statistically detectable OA advantage for self-selected OA in the same journals and time-window. You cannot show that an effect is an artifact of self-selection unless you show that with self-selection the effect is there, whereas with randomization it is not. All Davis et al showed was that there is no detectable OA advantage at all in their one-year sample (247 articles from 11 Biology journals); randomness and self-selection have nothing to do with it.

Davis et al released their results prematurely. We are waiting*,** to hear what Davis finds after 2-3 years, when he completes his doctoral dissertation. But if all he reports is that he has found no OA advantage at all in that sample of 11 biology journals, and that interval, rather than an OA advantage for the self-selected subset and no OA advantage for the randomized subset, then again, all we will have is a failure to replicate the positive effect that has now been reported by many other investigators, in field after field, often with far larger samples than Davis et al's.

*Note added October 31, 2010: Davis's dissertation turns out to have been posted on the same day as the present posting (October 20; thanks to Les Carr for drawing this to my attention on October 24!).
**Note added November 24, 2010: Phil Davis's results -- a replication of the OA download advantage and a non-replication of the OA citation advantage -- have since been published as: Davis, P. (2010) Does Open Access Lead to Increased Readership and Citations? A Randomized Controlled Trial of Articles Published in APS Journals. The Physiologist 53(6) December 2010.
Davis's results are welcome and interesting, and include some good theoretical insights, but insofar as the OA Citation Advantage is concerned, the empirical findings turn out to be just a failure to replicate the OA Citation Advantage in that particular sample and time-span -- exactly as predicted above. The original 2008 sample of 247 OA and 1372 non-OA articles in 11 journals one year after publication has now been extended to 712 OA and 2533 non-OA articles in 36 journals two years after publication. The result is a significant download advantage for OA articles but no significant citation advantage.

The only way to describe this outcome is as a non-replication of the OA Citation Advantage on this particular sample; it is most definitely not a demonstration that the OA Advantage is an artifact of self-selection, since there is no control group demonstrating the presence of the citation advantage with self-selected OA and the absence of the citation advantage with randomized OA across the same sample and time-span: There is simply the failure to detect any citation advantage at all.

This failure to replicate is almost certainly due to the small sample size as well as the short time-span. (Davis's a-priori estimates of the sample size required to detect a 20% difference took no account of the fact that citations grow with time; and the a-priori criterion fails even to be met for the self-selected subsample of 65.)

"I could not detect the effect in a much smaller and briefer sample than others" is hardly news! Compare the sample size of Davis's negative results with the sample-sizes and time-spans of some of the studies that found positive results:

Meanwhile, our study was similar to that of Davis et al's, except that it was a much bigger sample, across many fields, and a much larger time window -- and, most important, we did have a self-selective matched-control subset, which did show the usual OA advantage. Instead of comparing self-selective OA with randomized OA, however, we compared it with mandated OA -- which amounts to much the same thing, because the point of the self-selection hypothesis is that the author picks and chooses what to make OA, whereas if the OA is mandatory (required), the author is not picking and choosing, just as the author is not picking and choosing when the OA is imposed randomly.

And our finding is that the mandated OA advantage is just as big as the self-selective OA advantage.

As we discussed in our article, if someone really clings to the self-selection hypothesis, there are some remaining points of uncertainty in our study that self-selectionists can still hope will eventually bear them out: Compliance with the mandates was not 100%, but 60-70%. So the self-selection hypothesis has a chance of being resurrected if one argues that now it is no longer a case of positive selection for the stronger articles, but a refusal to comply with the mandate for the weaker ones. One would have expected, however, that if this were true, the OA advantage would at least be weaker for mandated OA than for unmandated OA, since the percentage of total output that is self-archived under a mandate is almost three times the 5-25% that is self-archived self-selectively. Yet the OA advantage is undiminished with 60-70% mandate compliance in 2002-2006. We have since extended the window by three more years, to 2009; the compliance rate rises by another 10%, but the mandated OA advantage remains undiminished. Self-selectionists don't have to cede till the percentage is 100%, but their hypothesis gets more and more far-fetched...

The other way of saving the self-selection hypothesis despite our findings is to argue that there was a "self-selection" bias in terms of which institutions do and do not mandate OA: Maybe it's the better ones that self-select to do so. There may be a plausible case to be made that one of our four mandated institutions -- CERN -- is an elite institution. (It is also physics-only.) But, as we reported, we re-did our analysis removing CERN, and we got the same outcome. Even if the objection of eliteness is extended to Southampton ECS, removing that second institution did not change the outcome either. We leave it to the reader to decide whether it is plausible to count our remaining two mandating institutions -- University of Minho in Portugal and Queensland University of Technology in Australia -- as elite institutions, compared to other universities. It is a historical fact, however, that these four institutions were the first in the world to elect to mandate OA.

One can only speculate on the reasons why some might still wish to cling to the self-selection bias hypothesis in the face of all the evidence to date. It seems almost a matter of common sense that making articles more accessible to users also makes them more usable and citable -- especially in a world where most researchers are familiar with the frustration of arriving at a link to an article that they would like to read (but their institution does not subscribe), so they are asked to drop it into the shopping cart and pay $30 at the check-out counter. The straightforward causal relationship is the default hypothesis, based on both plausibility and the cumulative weight of the evidence. Hence the burden of providing counter-evidence to refute it is now on the advocates of the alternative.

Davis, PN, Lewenstein, BV, Simon, DH, Booth, JG, & Connolly, MJL (2008) Open access publishing, article downloads, and citations: randomised controlled trial , British Medical Journal 337: a568

Gargouri, Y., Hajjem, C., Lariviere, V., Gingras, Y., Brody, T., Carr, L. and Harnad, S. (2010) Self-Selected or Mandated, Open Access Increases Citation Impact for Higher Quality Research. PLOS ONE 10(5) e13636

Harnad, S. (2008) Davis et al's 1-year Study of Self-Selection Bias: No Self-Archiving Control, No OA Effect, No Conclusion. Open Access Archivangelism July 31 2008

Posted by Stevan Harnad in Methodology at 02:55 | Comments (0) | Trackbacks (0)

Trackbacks

Trackback specific URI for this entry

No Trackbacks

Comments

Display comments as (Linear | Threaded)

No comments

Add Comment

Name
Email
Homepage
In reply to
Comment	Enclosing asterisks marks text as bold (word), underscore are made via _word_. Standard emoticons like :-) and ;-) are converted to images. E-Mail addresses will not be displayed and will only be used for E-Mail notifications To leave a comment you must approve it via e-mail, which will be sent to your address after submission. To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly. Enter the string from the spam-prevention image above: E-Mail addresses will not be displayed and will only be used for E-Mail notifications To leave a comment you must approve it via e-mail, which will be sent to your address after submission.
	Remember Information? Subscribe to this entry
Submitted comments will be subject to moderation before being displayed.

Correlation, Causation, and the Weight of Evidence

Open Access Archivangelism

Wednesday, October 20. 2010

Correlation, Causation, and the Weight of Evidence

EnablingOpenScholarship (EOS)

Federal Research Public Access Act (FRPAA)

Alliance for Taxpayer Access (ATA)

Creative Commons License:

Quicksearch

Syndicate This Blog

Materials You Are Invited To Use To Promote OA Self-Archiving:

Archives

Calendar

Categories

Blog Administration

Statistics

Top Referrers

Syndicate This Blog