Update Jan 1, 2010: See Gargouri, Y; C Hajjem, V Larivière, Y Gingras, L Carr,T Brody & S Harnad (2010) “Open Access, Whether Self-Selected or Mandated, Increases Citation Impact, Especially for Higher Quality Research”
Update Feb 8, 2010: See also "Open Access: Self-Selected, Mandated & Random; Answers & Questions"
As Eysenbach’s very long and remarkably intemperate
response to my
prior response mainly repeats prior (answered) points, I will respond only to his very few substantive points:
I had asked: “Does [Eysenbach] seriously think that partialling out the variance in the number of authors would make a dent in that huge, consistent effect [the within-journal citation advantage for self-archived articles]?”
GE: “the answer is ‘absolutely’… If high-author papers are overrepresented in self-archived papers, then this confounder alone will contribute to having a greater number of citations… Only if one statistically controls for all these confounders (there are several of them - see PLoS paper), and one STILL sees an open access citation advantage, then (and only then) one has a SOLID, defendable study. ”
Here is Eysenbach’s list of confounders :
(1) number of authors: As Eysenbach says he is serious, we will now test this. We have the data. Eysenbach’s prediction is that partialling out the effect of the number of authors will make a dent in our huge, consistent citation advantage. Stay tuned...
(2) number of days since publication: This is relevant and feasible in a 1-year, 1-journal study like Eysenbach’s but neither relevant nor feasible for a sample of over a million articles ranging over 12 years, 12 disciplines, and hundreds of journals -- all showing exactly the same citation advantage for self-archived articles in every year and every discipline.
(3) article type: We are able to test this separately too (because we have ISI data on article type) but first let’s see whether partialling out author numbers makes a dent in our basic effect.
(4) country of the corresponding author: This is testable too, but first let’s see how the author-number ‘confounder’ pans out (we could look at the first-author's birth-sign too...).
(5) funding type: Data not available, and extremely far-fetched.
(6) subject area: Already tested and reported in our data, separately for 12 different disciplines : the self-archiving advantage is consistently present in all of them.
(7) submission track (PNAS has three different ways that authors can submit a paper): Not relevant to the journals we tested, which were all non-OA and pre-dated Open Choice.
(8) previous citation record of the first and last authors: This, as I noted, is -- along with the demonstration of how early the OA advantage emerges in PNAS – a potentially interesting variable in the fine-tuning of the OA advantage, but our own studies are concerned with estimating the generality and size of the OA advantage, not with its fine tuning.
(9) whether authors choosing the OA option in PNAS chose to do so for only their most important research (“they didn't”): Neither Eysenbach’s study nor ours can confirm causality or eliminate the possibility of self-selection bias.
GE: "the fact that we look at a immediate (gold-)OA article population in a longitudinal cohort study design takes care of the “arrow of causation” problem, because it makes sure that open access status comes first, then the citations are coming, not the other way round.
I'm afraid it's not quite that easy to take care of the "arrow of causation" problem, which is confounded (sic) with the problem of self-selection bias: For if authors are (contrary to their subjective reports) indeed self-selecting their better papers (or themselves!) for OA-gold (or for self-archiving) then that, and not the OA, could explain why their papers get more citations.
GE: “it is entirely possible that the articles in his sample (which he refers to as green-OA articles) were not “immediately” self-archived after publication, but 1 month, 6 months, or 12 months after original publication, therefore not really what Harnad refers to as green-OA, implying “immediate” deposition.”
This is actually a valid point of definition: OA should be defined as ‘immediate’ in order to rule out claims that delayed/embargoed access is Open Access. The point at which refereed research can and should begin to be used is when the final refereed draft is accepted for publication, and that is the point when it should be made freely accessible online. So a portion of the citation advantage for self-archived articles could well have come from self-archiving later than the publication date; technically speaking this should be called a ‘free access’ advantage, if we reserve the term "OA" for access that is free immediately. But surely nothing of substance rides on this: If there is a self-archiving advantage even for tardy self-archiving, that confirms, a fortiori, the self-archiving advantage of prompt (OA) self archiving too!
GE: “I… made a conscientious decision to submit my paper to a gold-OA journal (PLoS) rather than publishing the study in an obscure scientrometrics journal and then self-archived [sic] it”
Actually, unless I am mistaken, I seem to recall corresondence from GE to the effect that it was first declined by
Science (or was it
Nature?) – not a gold-OA journal – before being submitted to
PloS Biology)…
GE: “The visibility of an article published in a properly promoted OA journal site will always be better than a paper that is published in a toll-access journal site, even if it is self-archived. This is exactly why my study shows an advantage of gold-OA over green-OA, this is also why I personally chose the gold route to publish this paper in PLoS, and not the green route”
Let us not confound a journal's profile/impact level with its OA/non-OA status.
The visibility (and no doubt also the citation impact) of an article will always be better when it is published in a high-profile, high-impact journal, whether it is OA (like
PLoS) or non-OA (like
Science or
Nature) rather than an obscure scientometrics journal (or an obscure OA journal). Its visibility and impact will be higher if self-archived in either case (except perhaps if the journal is both high-profile and optional-OA, which is partly what Eysenbach’s study has shown).
GE: “the PLoS paper is the first study which contains an analysis of both gold and green (thus focuses on “OA itself”), whereas the rest of the studies is actually focused on ‘green’”.
Because most of the existing data for within-journal OA/non-OA comparisons comes from the millions of articles published in the thousands of non-gold journals indexed by ISI and not just the thousands of articles published in the few journals that are as yet (like PNAS) optional-gold...
Stevan Harnad
American Scientist Open Access Forum