Update Jan 1, 2010: See Gargouri, Y; C Hajjem, V Larivière, Y Gingras, L Carr,T Brody & S Harnad (2010) “Open Access, Whether Self-Selected or Mandated, Increases Citation Impact, Especially for Higher Quality Research”
Update Feb 8, 2010: See also "Open Access: Self-Selected, Mandated & Random; Answers & Questions"
SUMMARY: Astronomy is unusual among research fields in that all research-active astronomers already have full online access to all relevant journal articles via institutional subscriptions (because astronomy has only a small closed circle of core journals). Many astronomy articles are also self-archived as preprints prior to peer review and publication, but usage all shifts to the published version as soon as it is available. Self-archiving, even where it is at or near 100%, has no effect at all on subscriptions or cancellations. The Open Access (OA) citation advantage hence reduces to merely an "Early Access Advantage" in astronomy, because all postprints are accessible to everyone. There is also the much-reported positive correlation between the citation counts of articles and the proportion of them that were self-archived. This is no doubt partly a self-selection effect or "Quality Bias" -- with the better articles more likely to be self-archived. But this is unlikely to be all or most of the source of the OA advantage even in astronomy -- let alone in most other fields, where the postprints are not all accessible to all active researchers. The most important component of the OA advantage in general is that OA removes the access and usage barriers that prevent the better work from having its full potential impact (Quality Advantage). In astronomy, where those access barriers hardly exist, there is still a measurable OA advantage, but mostly just because of Early Advantage (and self-selection). With all postprints accessible, Competitive Advantage is restricted to the prepublication phase; Usage Advantage (downloads) can be estimated: downloads are doubled by universal online accessibility. And the Quality Advantage no doubt persists (though it is difficult to estimate independently).
Michael Kurtz (Harvard-Smithsonian Center for Astrophysics) has provided some (as always) very interesting and informative
data on the special case of research access and self-archiving practices in Astronomy. His data show that:
(1) In astronomy, where all active, publishing researchers already have online access to all relevant journal articles (a very special case!), researchers all use the versions "eprinted" (self-archived) in
Arxiv first, because those are available first; but they all switch to using the journal version, instead of the self-archived one, as soon as the journal version is available.
That is interesting, but hardly surprising, in view of the very special conditions of astronomy: If I only had access to a self-archived preprint or postprint first, I'd used that,
faute de mieux. And as soon as the official journal version was accessible -- assuming that it's equally accessible -- I'd use that.
But these conditions -- (i) open accessibility of the eprint before publication, (ii) in one longstanding central repository (Arxiv), for many and in some cases most papers, and (iii) open accessibility of the journal version of all papers upon publication -- is simply not representative of most other fields! In most other fields, (i') only
about 15% of papers are available early as preprints or postprints, (ii') they are self-archived in
distributed IRs and
websites, not
one central one (Arxiv), and (iii') the journal versions of many papers are
not accessible at all to many of the researchers after publication.
That's a very different state of affairs (outside astronomy and some areas of physics).
(2) Kurtz's
data showing that astronomy journals are not cancelled despite 100% OA are very interesting, but they too follow almost tautologically from (1): If virtually all researchers have access to the journal version, and virtually all of them prefer to use that rather than the eprint, it stands to reason that it is not being cancelled! (What is cause and what is effect there is another question -- i.e., whether preference is driving subscriptions or subscriptions are driving preference.)
(3) In astronomy, as indicated by Kurtz, there is a small, closed circle of core journals, and all active researchers worldwide already have access to all of them. But in many other fields there is not a closed circle of core journals, and/or not all researchers have access. Hence access to a small set of core journals is not a precondition for being an active researcher in many fields -- which does not mean that lacking that access does not weaken the research (and that is the point!).
(4) I agree completely that there is a
component of self-selection Quality Bias (QB) in the correlation between self-archiving and citations. The question is (4a) how much of the higher citation count for self-archived articles is due to QB (as opposed to Early Advantage, Competitive Advantage, Quality Advantage, Usage Advantage, and Arxiv (Central) Bias)? And (4b) does self-selection QB itself have any causal consequences (or are authors doing it purely superstitiously, since it is has no causal effects at all)? The effects of course need not be felt in citations; they could be felt in downloads (usage) or in other measures of impact (co-citations, influence on research direction, funding, fame, etc.).
The most important thing to bear in mind is that it would be absurd to imagine that somehow OA guarantees a quality-blind linear increment to the usage of any article, regardless of its quality. It is virtually certain that OA will benefit the better articles more, because they are more worth using and trying to build upon, hence more handicapped by access-barriers (which
do exist in fields other than astro). That's QA, not QB. No amount of accessibility will help unciteable papers get used and cited. And most papers are uncited, hence probably unciteable, no matter how visible and accessible you may try to make them!
(5) I think we agree that the basic challenge in assessing causality here is that we have a positive correlation (between proportion of papers self-archived and citation-counts) but we need to analyze the direction of the causation. The fact that more-cited papers tend to be self-archived more, and less-cited papers less is merely a restatement of the correlation, not a causal analysis of it: The citations, after all, come
after the self-archiving, not before!
The only methodologically irreproachable way to test causality would be to randomly choose a (sufficiently large, diverse, and representative) sample of N papers at the time of acceptance for publication (postprints -- no previous preprint self-archiving) and randomly
impose self-archiving on N/2 of them, and not on the other N/2. That way we have random selection and not self-selection. Then we count citations for about 2-3 years, for all the papers, and compare them.
No one will do that study, but an approximation to it can be done (and we are doing it) by comparing (a) citation counts for papers that are self-archived in IRs that have a
self-archiving mandate with (b) citation counts for papers in IRs without mandates and with (c) papers (in the same journal and year) that are not self-archived.
Not a perfect method, problems with small Ns, short available time-windows, and admixtures of self-selection and imposed self-archiving even with mandates -- but an approximation nonetheless. And other metrics -- downloads, co-citations, hub/authority scores, endogamy scores, growth-rates, funding, etc. -- can be used to triangulate and disambiguate. Stay tuned.
Now some comments:
On Tue, 10 Oct 2006, Michael Kurtz wrote:
"Recently Stevan has copied me on two sets of correspondance concerning the OA citation advantage; I thought I would just briefly respond to both.
"Besides our IPM article: http://adsabs.harvard.edu/abs/2005IPM....41.1395K we have recently published two short papers, both with graphs you might find interesting.
"The preprint will appear in Learned Publishing http://adsabs.harvard.edu/abs/2006cs........9126H E-prints and Journal Articles in Astronomy: a Productive Co-existence
"and this is in the J. Electronic Publishing http://adsabs.harvard.edu/abs/2006JEPub...9....2H Effect of E-printing on Citation Rates in Astronomy and Physics
"There is a point I would like to emphasize from these papers. Figure 2 of the Learned Publishing paper shows that the number of ADS users who read the preprint version once the paper has been released drops to near zero. This shows that essentially every astronomer has subscriptions to the main journals, as ADS treats both the arXiv links and the links to the journals equally; also it shows that astronomers prefer the journals."
And it also shows how anomalous Astronomy is, compared to other fields, where it is certainly not true that every researcher has subscriptions to the main journals...
"Figure 5 of the J Electronic Publishing paper also shows that there is no effect of cost on the OA reads (and thus by extension citation) differential. Note in the plot that there is no change in slope for the obsolescence function of the reads (either of preprinted or non-preprinted) at 36 months. At 36 months the 3 year moving wall allows the papers to be accessed by everyone, this shows clearly that there is no cost effect portion of the OA differential in astronomy. This confirms the conclusion of my IPM article."
And it underscores again, how unrepresentative astronomy is of research as a whole.
"Citations are probably the least sensitive measure to see the effects of OA. This is because one must be able to read the core journals in order to write a paper which will be published by them. It is really not possible for a person who has not been regularly reading journal articles on, say, nuclear physics, to suddenly be able to write one, and cite the OA articles which enabled that writing. It takes some time for a body of authors who did not previously have access to form and write acceptable papers."
In astronomy -- where the core journals are few and a closed circle, and all active researchers have access to them. But this is not true of research as a whole, across disciplines (or around the world). Researchers in most fields are no doubt handicapped for having less than full access, but that does not prevent them from doing and publishing research altogether.
"Any statistical analysis of the causal/bias distinction must take into account the actual distribution of citations among articles. This is why I made the monte carlo analysis in the IPM paper. As a quick example for papers published in the Astrophysical Journal in 2003: The most cited 10% have 39% of all citations, and are 96% in the arXiv; the lowest cited 10% have 0.7% of all citations and are 29% in the arXiv. Showing the causal hypothesis is true will be very difficult under these conditions."
(i) Since all of the published postprints in all these journals are accessible to all research-active astronomers as of their date of publication, we are of necessity speaking here mostly about an Early Access effect (preprints). Most of the other components of the Open Access Advantage (Competitive Advantage, Usage Advantage, Quality Advantage) are minimized here by the fact that everything in astronomy is OA from the date of publication onward. The remaining components are either Arxiv-specific (the Arxiv Bias -- the tradition of archiving and hence searching in one central repository) or self-selection [Quality Bias] influencing who does and does not self-archive
early, with their prepublication preprint.
Since most fields don't post pre-refereeing preprints at all, this comparison is mostly moot. For most fields, the question about citation advantage concerns the postprint only, and as of the date of acceptance for publication, not before.
(ii) In other fields too, there is the
very same correlation between citation counts and percentage self-archived, but it is based on
postprints, self-archived at publication, not pre-refereeing preprints self-archived much earlier. And, most important, it is not true in these fields that the postprint is accessible to all researchers via subscription:
Many potential users cannot access the article at all if it is not self-archived -- and that is the main basis for the
OA impact advantage.
"Perhaps the journal which is most sensitive to cancellations due to OA archiving is Nuclear Physics B; it is 100% in arXiv, and is very expensive. I have several times seen librarians say that they would like to cancel it. One effect of OA on Nuclear Physics B is that its impact factor (as we measure it, I assume ISI gets the same thing) has gone up, just as we show in the J E Pub paper for Physical Review D. Whether Nuclear Physics B has been cancelled more than Nuclear Physics A or Physics Letters B must be well known at Elsevier."
It is an interesting question whether NPB is being cancelled, but if it is, it clearly is not because of self-archiving, nor because of astronomy's special "universal paid OA" OA to the published version: if NPB is being cancelled, it is for the usual reason, which is that it is not good enough to justify its share of the institution's journal budget.
Harnad, S. (2005) OA Impact Advantage = EA + (AA) + (QB) + QA + (CA) + UA
Stevan Harnad
American Scientist Open Access Forum