Update Jan 1, 2010: See Gargouri, Y; C Hajjem, V Larivière, Y Gingras, L Carr,T Brody & S Harnad (2010) “Open Access, Whether Self-Selected or Mandated, Increases Citation Impact, Especially for Higher Quality Research”
Update Feb 8, 2010: See also "Open Access: Self-Selected, Mandated & Random; Answers & Questions"
Gunther Eysenbach (
GE) (in a letter in
letter PLoS Biology today) wrote:
GE: "The introduction of the article and two accompanying editorials [1, 2, 3] already answer Harnad's questions why author, editors, and reviewers were critical of the methodology employed in previous studies, which all only looked at "green OA" (self-archived/online-accessible papers)"
I didn't ask why the author and editors were critical of prior self-archiving (green OA) studies; I asked why they said such studies were "surprisingly hard to find" and why the two biggest and latest of them were not even taken into account:
Brody, T., Harnad, S. and Carr, L. (2005) Earlier Web Usage Statistics as Predictors of Later Citation Impact. Journal of the American Association for Information Science and Technology (JASIST) 56.
Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4) pp. 39-47.
And the reason all prior within-journal studies look only at "green OA" is that the majority of OA today is green; hence almost all OA/NOA impact comparisons are based on green OA (self-archiving) rather than on paid-OA (gold). To compare OA and NOA between rather than within journals would be to compare
apples and oranges: See critique of ISI's
between-journal OA/NOA comparisons in:
Brody, T. and Harnad, S. (2004) Comparing the Impact of Open Access (OA) vs. Non-OA Articles in the Same Journals. D-Lib Magazine 10(6).
GE: " (hint 1: "confounding") (hint 2: arrow of causation: are papers online because they are highly cited, or the other way round?)."
I am afraid I don't see Eysenbach's point here at all: What exactly does he think is being confounded in within-journal comparisons of self-archived versus non-self-archived articles? The paid-OA effect? But among OA articles today there is almost zero within-journal paid-OA, because so few journals offer it! (Hajjem et al.'s within-journal comparisons were based on over a million articles, across 12 years and hundreds of journals, in 12 disciplines! Eysenbach's were based on 1492 articles, in 6 months, in one journal.)
And is Eysenbach suggesting that his failure to find any significant difference among author self-reports -- about their own article's quality and its causal role in their decision about whether or not to pay for OA (or to self-archive) in his sample of 237 authors -- is an objective test of the arrow of causation? (I agree that Eysenbach's failure to find a difference fails to support the hypothesis of a self-selection bias, but surely that won't convince those who are minded to hold that hypothesis! I would welcome rigorous causal evidence against the self-selection hypothesis as much as Eysenbach would, but author self-reports are alas not that evidence!)
GE: " The statement in the PLoS editorial has to be seen against this background. None of the previous papers in the bibliography mentioned by Harnad employed a similar methodology, working with data from a "gold-OA" journal."
Yes, almost all prior studies on the OA impact advantage are based on green OA, not gold, but so what? It is Eysenbach (and PLoS) who are focussed on gold-OA journals; the rest of the studies are focussed on OA itself. Only about
10% of the planet's peer-reviewed journals are gold today, and most of those are 100% gold, hence allow no within-journal comparisons. Very few journals as yet offer authors the "Open Choice" (optional paid gold) that would allow gold within-journal OA/NOA comparisons; and few authors are as yet taking those journals up on it (about 15% in this PNAS sample), compared to the far larger number that are self-archiving (also 15%, as it happens, though that percentage too is still far too small!). The difference in article sample sizes is about four orders of magnitude (c. 1500 articles in Eysenbach's study to 1.5 million in Hajjem et al's).
GE: " The correct method to control for problem 1 (multiple confounders) is multivariate regression analysis, not used in previous studies."
Correct. But with the large, consistent within-journal OA/NOA differences found across al journals, all disciplines and all years in samples four orders of magnitude larger than Eysenbach's, it is not at all clear that controls for those "multiple confounders" are necessary in order to demonstrate the reality, magnitude and universality of the OA advantage. That does not mean the controls are not useful, just that they are not yet telling us much that we don't already know.
GE: " Harnad's statement that "many [of the confounding variables] are peculiar to this particular... study" suggests that he might still not fully appreciate the issue of confounding. Does he suggest that in his samples there are no differences in these variables (for example, number of authors) between the groups? Did he even test for these? If he did, why was this not described in these previous studies?"
No, we did not test for "confounding effects" of number of authors: What confounding effects does Eysenbach expect from controlling for number of authors in a sample of over a million articles across a dozen disciplines and a dozen years all showing the very same, sizeable OA advantage? Does he seriously think that partialling out the variance in the number of authors would make a dent in that huge, consistent effect?
Not that Eysenbach's tentative findings on 1st-author/last-author differences in his one-journal sample of 1492 are not interesting; but those are merely minor differences in shading, compared to the whopping main effect, which is: substantially more citations (and downloads) for self-archived OA articles.
GE: " The correct method to address problem 2 (the "arrow of causation" problem) is to do a longitudinal (cohort) study, as opposed to a cross-sectional study. This ascertains that OA comes first and THEN the paper is cited highly, while previous cross-sectional studies in the area of "green OA" publishing (self-archiving) leave open what comes first -- impact or being online."
I agree completely that time-based studies are necessary to demonstrate causation, for those who think that the OA advantage might be based on self-selection bias (i.e., that high-impact studies tend to be preferentially self-archived, perhaps even after they have gained their high impact), but Eysenbach's author self-report data certainly don't constitute such a longtitudinal cohort study! (Once there exist reliable deposit dates for self-archived articles, we will be able to do some time-based analyses on green OA too, but, frankly, by that time the outcome is likely to be a foregone conclusion.)
In the meanwhile, the fact that (a) the OA advantage does not diminish for younger articles (as one would expect if it were a post-hoc effect), that (b) OA increases downloads, and that (c) increased downloads in the first 6 months are correlated with increased citations later on -- plus the logic of the fact that (d) unaffordability reduces access and that (e) access is a necessary condition for citation -- all suggest that most of the scepticism about the SOA advantage is because of conflicting interests, not because of objective uncertainty.
GE: " Harnad - who usually carefully distinguishes between "green" and "gold" OA publishing -- ignores that open access is a continuum, much as publishing is a continuum"
I'm afraid I have no idea what Eysenbach means about OA being a continuum: Time is certainly a continuum, and
access certainly admits of degrees (access may be easier/harder, narrower/wider, cheaper/dearer, longer/shorter, earlier/later, partial/full) -- but
Open Access does not admit of degrees (any more than pregnancy does).
OA means immediate, permanent, full-text online access, free for all, now.
And, by the way,
green OA is certainly not a lesser degree of gold OA!
For the innocent reader, puzzled as to why this would even be an issue:
Please recall that OA (gold) journals, whether total or optional gold, need authors (and those gold journals with the gold cost-recovery model need
paying author/institutions). To attract authors, gold journals need to persuade them of the benefits of OA. So far so good. But there is another thing they have to persuade them of, implicitly or explicitly, and that is
the benefits of gold OA over green OA. For if there
are no benefits of gold over green, then surely it makes much more sense for authors to publish in their journal of choice, as they always did, and simply self-archive their own articles, rather than switching journals and/or paying for gold OA!
This theme alas keeps recurring, implicitly or explicitly, in the internecine green/gold squabbles, because green OA is indeed a rival to gold OA in gold OA journals' efforts to win over authors. This is regrettable, but a functional fact today, owing to the nature of OA and of the two means of providing it.
Is the effect symmetrical? Is gold OA likewise a rival to green OA? Here the answer is more complicated: No, an author who chooses gold OA (by publishing in an OA journal) is not at all a loss for green OA, because the article is nevertheless OA, and green OA's sole objective is 100% OA, as soon as possible, and nothing else. (Besides, a gold OA article too can be self-archived in the author's
Institutional Repository if the author or institution wishes! All
gold journals are,
a fortiori, also
green, in that they endorse author self-archiving.)
But there is a potential problem with gold from the standpoint of green. The problem is not with authors choosing gold. The problem is with gold publishers promoting gold as superior to green, or, worse, with gold publishers implying that green OA is not really OA, or not "fully" OA (along some imaginary OA "continuum").
"Free Access vs. Open Access" (thread started Aug 2003)
Why, you ask, would gold OA want to give the impression that green OA was not "really" OA or not "fully" OA? Because of the rivalry for authors that I just mentioned. The causal arrow is a one-way one insofar as competition for authors is concerned: green OA does not lose an author if that author publishes in a gold OA journal, whereas gold OA does lose an author if an author publishes in a green journal instead of a gold one. However, if gold portrays green as if it were not really or fully OA, and authors believe this, then it loses author momentum for green -- especially among that vast majority of authors who
do not yet elect to publish gold. For there is today something still very paradoxical, indeed equivocal, about author behavior and motivation vis-a-vis OA:
Authors profess to want OA. Thirty-four thousand of them even signed the
2001 PLoS Open Letter threatening to boycott their journals if they did not provide (gold) OA (within 6 months of publication). (Most journals did not comply, and most authors did not follow through on their boycott threat: How could they? There were not enough suitable gold journals for them to switch to, and most authors clearly were not interested in switching journals, let alone paying for publication, then or now.)
Yet (and here comes the
paradox): if those 34,000 signatories -- allegedly so desirous of OA as to be ready to boycott their journals if they did not provide it -- had simply gone on to self-archive all their papers, they would be well on the road to having the OA they allegedly desired so much! For the green road to 100% OA happens to be based on the (
golden!) rule: Self-Archive Unto Others As You Would Have Them Self-Archive Unto You.
Why didn't (and don't) most authors do it (yet)? It is partly (let us state it quite frankly) straightforward foolishness and inconsistency on authors' part. They simply have not thought it through. This cannot be denied. Authors are in a state of self-induced "
Zeno's Paralysis" regarding OA, from which
FAQs have so far been powerless to free them -- so that it now looks as if
self-archiving mandates from their institutions and/or their funders will be the only thing that can
induce them to do what will give them what they so want and need.
But the confusion and inaction are partly also the fault of the promotional efforts of (well-meaning) OA advocates. Harold Varmus sent a mixed message with his 1999 "
E-biomed" proposal (which led to PLoS, the PLoS Open letter, PubMed Central, Biomed Central, and eventually the PLoS and BMC fleet of OA journals, including PLoS Biology). Was E-biomed a gold proposal, a green proposal, both, or neither? The fact is that it was an incoherent proposal -- a confused and confusing mish-mash of central self-archiving, publishing reform/replacement and rival publishing -- and although it has undeniably led to genuine and valuable progress toward (what was eventually baptized by
BOAI as) OA, it has left a continuing legacy of continuing confusion too.
And we are facing part of that legacy of confusion now, with PLoS thinking that the only way (or the best) to reach 100% OA is to publish and promote gold OA journals. That is why PLoS Biology agreed to referee the Eysenbach paper, which seemed to show that OA gold is the only one that increases citation impact, not green self-archiving, which is (when you come right down to it) not even "real" OA at all!
That is also why PLoS Biology editorialised that they found it "surprisingly hard to find" evidence -- "solid evidence" -- that OA articles are read and cited more. And that is why PLoS Biology was happy to make an exception and publish the Eysenbach study, even though scientometrics is not the subject matter of PLoS Biology, but (I'll warrant) PLoS Biology would not have been happy to advertise in its pages the fact that green OA self-archiving was enough to get articles read and cited more!
So green OA does have a bit of an uphill battle against gold OA and the subsidies and support it has received (because gold OA is an attractive and understandable idea, whereas green OA requires a few more mental steps to dope out -- though not many, as none of this is rocket science!).
But, to switch metaphors, the green road to 100% OA (sic) is far wider, faster and surer than the golden road. (Every article can be self-archived, today, and without their authors' having to renounce or switch journals, whereas most articles do not yet have a suitable OA journal to publish in today, even if their authors wished to switch journals, which most do not; and authors can be mandated to self-archive by their institutions and funders, but neither authors' choice of journals nor their publishers' choice of access-provision or cost-recovery model can be mandated by authors' institutions and funders.) Moreover, 100% OA really is beneficial to research and researchers; so the green road of self-archiving is bound to prevail, despite the extra obstacles. And the destination (100% OA) is exactly the same for both roads. (Indeed, I am pretty sure that even the fastest way to reach 100%
gold OA -- i.e., not just 100% OA but also the conversion of all journals to gold -- is in fact to take the green road to 100% OA first.
So gold is doing
itself a disservice when it tries to devalue green. Read on:
GE: " and this study (and the priority claims in the editorial) was talking about the gold OA end of the spectrum."
Spectrum? Continuum? Degrees of OA?
GE: " Publishing in an open access journal is a fundamentally different process from putting a paper published in a toll-access journal on the Internet. In analogy, printing something on a flyer and handing it out to pedestrians on the street, and publishing an article in a national newspaper can both be called "publishing", but they remain fundamentally different processes, with differences in impact, reach, etc. A study looking at the impact of publishing a newspaper can not be replaced with a study looking at the impact of handing out a flyer to pedestrians, even though both are about "publishing"."
Oh dear! I have a feeling Eysenbach is going to tell as that making a
published journal article accessible online free for all by self-archiving it is not OA after all, or not "full OA". If the journal doesn't do it for you, and/or you don't pay for it, it's not the real thing.
I wonder why Eysenbach would want to say that? Could it be because he is promoting an OA (gold) journal (his own)? Could that also have been the reason the PLoS editorial was so sanguine about Eysenbach's findings on the OA gold advantage, and so dismissive of any prior evidence of an OA green advantage?
GE: " Finally, Harnad says that "prior evidence derived from substantially larger and broader-based samples showing substantially the same outcome". I rebut with two points here[:] Regarding "larger samples" I think rigor and quality (leading to internal validity) is more important than quantity (or sample size)."
Even when all within-journal studies -- large and small, approximate and exact -- just keep producing
exactly the same outcome, every time (OA increases impact)?
GE: " Going through the laborious effort to extract article and author characteristics for a limited number of articles (n = 1492) in order to control for these confounders provides scientifically stronger evidence than doing a crude, unadjusted analysis of a huge number of online accessible vs non-online accessible articles, leaving open many alternative explanations."
As I said, for those who doubt the causality and think the OA advantage is just a self-selection bias, Eysenbach's study will not convince them otherwise either. For those with eyes to see, the repeated demonstrations, in field after field, of exactly the same effect on incomparably larger samples will already have been demonstration enough. For those with eyes only for gold, evidence that green enhances citations will never be "solid evidence."
If Eysenbach and the editors had portrayed the latest PLoS findings as they should have, namely, as yet another confirmation of the OA impact advantage, with some new details about its fine-tuning, I would have done had nothing but praise for it. But the actual self-interested spin and puffery that instead accompanied this work -- propagating the frankly false idea that this is the first "solid evidence" for the OA impact advantage, and, worse, that it implies that self-archiving itself does not deliver the OA impact advantage -- would have required not the lack of an ego, but the lack of any real fealty to OA itself to have been allowed to stand uncontested.
GE: " Secondly, contrary to what Harnad said, this study is NOT at all "showing substantially the same outcome". On the contrary, the effect of green-OA -- once controlled for confounders - was much less than what others have claimed in previous papers."
Let's be quite explicit about what, exactly, we are discussing here:
Eysenbach found that in a 6-month sample of 1492 articles in one 3-option journal (PNAS):
"While in the crude analysis self-archived papers had on average significantly more citations than non-self-archived papers (mean, 5.46 versus 4.66; Wilcoxon Z = 2.417; p = 0.02), these differences disappeared when stratified for journal OA status (p= 0.10 in the group of articles published originally as non-OA articles, and p = 0.25 in the group of articles published originally as OA).
"In a logistic regression model with backward elimination, which included original OA status and self-archiving OA status as separate independent variables as well as all potential confounders, self-archiving OA status did not remain a significant predictor for being cited. In a linear regression model, the influence of the covariate "article published originally as OA, without being self-archived" (beta = 0.250, p < 0.001) on citations remained stronger than self-archiving status (beta = 0.152, p = 0.02)."
To translate this into english (from an article with
exceedingly user-unfriendly data-displays, by the way, making it next to impossible to extract and visualize results from the tables by inspection!): First, the numbers:
NOA (Not OA): (
1159 articles
86.2% cited at least once)
POA (Payed OA only): (
176 articles
94.3% cited at least once)
SOA (Self-Archived OA only): (
121 articles
90.1% cited at least once)
BOA (POA and SOA): (
36 articles
97.2% cited at least once)
The finding is that (in this PNAS sample, and with many other factors -- e.g., days since publication, number of authors, article type, country, funding, subject, etc. -- statistically isolated so as to be asessable independently): POA, SOA and BOA considered together, and PAO considered alone, all have significantly more citations than NOA; but SOA considered alone ("stratified") does not. Also, if considered jointly (multiple regression), both POA and SOA increase citations, but POA is the stronger effect.
Here are three simple hypotheses, in decreasing order of likelihood, as to why this small PNAS study may have found that the citation counts and their significance ordered themselves as they did: BOA>POA>>SOA>NOA
Hypothesis 1: The POA advantage might be unique to high-profile 3-option journals (POA, SOA, NOA) like PNAS (which are themselves a tiny minority among journals) and occurs because the POA articles are more visible than the SOA articles. (The POA + SOA = BOA articles do the best of all: redundancy enhances visibility.) So the POA authors
do get something more for their money (but that something is not
OA but high-profile POA in a high-profile journal) -- at least for the time being. This extra POA-over-SOA advantage will of course wash out as SOA and
indexed, interoperable
Institutional Repositories for self-archiving grow.
Hypothesis 2: The POA advantage might result at least in part from
QB (self-selection Quality Bias) because the decision (by a self-selected 15% subset of PNAS authors) to
pay for POA is influenced by the author's underlying sense of the potential importance (hence impact) of his article: Simply asking authors about how important they think their article is, and whether that influenced their decision to pick POA or SOA or NOA, and failing to detect any significant difference among the authors, does not settle this matter, and certainly not on the basis of such a small and special sample. (But I think QB is just one of many contributors to the OA citation advantage itself, and certainly not the only determinant or even the biggest one.)
Hypothesis 3: The POA advantage might be either a small-sample chance result or a temporary side-effect of the 3-option journals in early days: a one-stop shopping advantage for PNAS articles, in a high-profile store, today. It needs to be tested for replicability and representativeness in larger samples of articles, journals, and time-bases.
(Note that
Lawrence's 2001 as well as
Hajjem et al's 2005 finding had been that the proportion of OA articles increases in the higher citation ranges, being lowest among articles with 0-1 citations.)
Eysenbach finds that with logistic regression analysis separating the independent effects of POA, SOA and other correlates, SOA has no significant independent effect in his 1-year PNAS sample. Now let's test whether that replicates in larger samples, both in terms of number of articles, journals, and time-base. (
Failure to find a significant effect in a small sample is far less compelling than success in finding a significant effect in a small sample!)
GE: " Harnad, a self-confessed "archivangalist", co-creator of a self-archiving platform, and an outspoken advocate of self-archiving (speaking of vested interests) calls the finding that self-archived articles are... cited less often than [gold] OA articles from the same journal "controversial". In my mind, the finding that the impact of nonOA < greenOA < goldOA < green+goldOA is intuitive and logical: The level of citations correlates with the level of openness and accessibility."
I don't dispute that POA can add more citations, just as BOA can; maybe self-archiving in 10 different places will add still more. But what does this imply, right now, practically speaking? And, even more important, how likely is it that this sort of redundancy will continue to confer significant citation advantages once a critical mass of the literature is in interoperable Institutional Repositories (green SOA) rather than few and far between, as now? It is indeed intuitive and logical that the baseline 15% of the literature as a whole that is being spontaneously self-archived somewhere, somehow on the Web, across all fields, has somewhat less visibility right now than the 15% of PNAS articles that PNAS is making OA for those authors who pay for it (POA). That's a one-stop shopping advantage for PNAS articles, against PNAS articles, in a high-profile store, today.
But the true measure of the SOA advantage today (at its 15% spontaneous baseline) is surely not to be found in PNAS but in the statistically far more numerous, hence far more representative full-spectrum of journals that do not yet offer POA. (I would be delighted if those journals took the Eysenbach findings as a reason for offering a POA option! But not at the expense of authors drawing the absurd conclusion -- not at all entailed by Eysenbach's PNAS-specific results -- that in the journals they currently publish in, SOA alone would not confer citation advantages at least as big as the ones we have been reporting.)
Regarding my self-confessed sin of archivanglizing, however, I do protest that my first and only allegiance is to 100% OA, and I evangelize the green road (and promote the self-archiving software) only because it is so resoundingly obvious that it is the fastest and surest road to 100% OA. (If empirical -- or logical -- evidence were ever to come out showing the contrary, I assure you I too would join the gold rush!)
GE: " Sometimes our egos stand in the way of reaching a larger common goal, and I hope Harnad and other sceptics respond with good science rather than with polemics and politics to these findings."
Well, first, let us not get carried away: There's precious little science involved here (apart from the science we are trying to provide Open Access to). The call to self-archive in order to enhance access and impact is so obvious and trivial that, as I noted, the puzzle is only why anyone would even have imagined otherwise.
But when it comes to polemics and politics (and possibly also egos), it might have kept things more objective if the results of Eysenbach's small but welcome study confirming the OA impact advantage had not been hyped with editorial salvos such as:
"solid evidence to support or refute... that papers freely available in a journal will be more often read and cited than those behind a subscription barrier... has been surprisingly hard to find..."
Or even the heavily-hedged: "As far as we are aware, no other study has compared OA and non-OA articles from the same journal and controlled for so many potentially confounding factors."
GE: " Unfortunately, in this area a lot more people have strong opinions and beliefs than those having the skills, time, and willingness to do rigorous research. I hope we will change this, and I reiterate a "call for papers" in that area [http://www.jmir.org/2006/2/e8/]"
May I echo that call, adding only that the rigorous research might perhaps be better placed in a journal specializing in scientometrics and in rigorously peer-reviewing it, rather than in The Journal of Medical Internet Research, or even PLoS Biology. Brody, T., Harnad, S. and Carr, L. (2005) Earlier Web Usage Statistics as Predictors of Later Citation Impact. Journal of the American Association for Information Science and Technology (JASIST) 56.
Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4) pp. 39-47.
I close with some replies to portions of another version of Eysenbach's response which appeared in his blog.GE: " Harnad's point that the PLoS paper is about the "citation advantage of open access" and that there have been "previous papers about the citation advantage of open access" (mostly his own studies, mostly not published in peer-reviewed journals) is as meaningful as saying "this paper is about a cancer treatment, and there are previous papers about cancer treatments, so this one doesn't add anything"."
That's not what I said. I said this: "[T]he only new knowledge from this small, journal-specific sample was (1) the welcome finding of how early the OA advantage can manifest itself, plus (2) some less clear findings about differences between first- and last-author OA practices, plus (3) a controversial finding that will most definitely need to be replicated on far larger samples in order to be credible: "The analysis revealed that self-archived articles are also cited less often than OA [sic] articles from the same journal."
And I do think all of this is as far away from rigorous oncological research as it is from rocket science! GE: " The statement made by the reviewers and editors of the PLoS paper that this is the first study looking at the citation advantage of an open access/hybrid journal remains correct until somebody can show me a reference where this has been done before."
But who ever contested that far more modest and circumspect statement (which was certainly not the one the accompanying PLoS editorial made)? This is indeed "the first study looking at the citation advantage of an open access/hybrid journal"; indeed, it's the first such study of PNAS. But it's certainly not the first study looking at the citation advantage of OA in general, or OA self-archiving in particular, and looking at it within journals -- within many journals, and many articles. GE: " In analogy, a small carefully designed cohort study showing a relationship between smoking and cancer with 1500 patients, obtaining through questionnaires and interviews additional variables which could account for the association and controlling for these confounders and still coming to the conclusion that there is a relation between smoking and cancer is scientifically stronger evidence than a quick-and-dirty uncontrolled cross-sectional study showing an association between smoking and cancer, even if this is done in a population of millions."
Indeed it would. And I forgot to add to my list (4) that Eysenbach had tested the hypothesis that the OA citation advantage is merely the result of a self-selection bias by asking 247 authors whether it was, and they replied that it wasn't...
Stevan Harnad
American Scientist Open Access Forum