Publishing Reform

Friday, May 26. 2006

The Epidemiology of OA

Update Jan 1, 2010: See Gargouri, Y; C Hajjem, V Larivière, Y Gingras, L Carr,T Brody & S Harnad (2010) “Open Access, Whether Self-Selected or Mandated, Increases Citation Impact, Especially for Higher Quality Research”
Update Feb 8, 2010: See also "Open Access: Self-Selected, Mandated & Random; Answers & Questions"

As Eysenbach’s very long and remarkably intemperate response to my prior response mainly repeats prior (answered) points, I will respond only to his very few substantive points:

I had asked: “Does [Eysenbach] seriously think that partialling out the variance in the number of authors would make a dent in that huge, consistent effect [the within-journal citation advantage for self-archived articles]?”

GE: “the answer is ‘absolutely’… If high-author papers are overrepresented in self-archived papers, then this confounder alone will contribute to having a greater number of citations… Only if one statistically controls for all these confounders (there are several of them - see PLoS paper), and one STILL sees an open access citation advantage, then (and only then) one has a SOLID, defendable study. ”

Here is Eysenbach’s list of confounders :

(1) number of authors: As Eysenbach says he is serious, we will now test this. We have the data. Eysenbach’s prediction is that partialling out the effect of the number of authors will make a dent in our huge, consistent citation advantage. Stay tuned...

(2) number of days since publication: This is relevant and feasible in a 1-year, 1-journal study like Eysenbach’s but neither relevant nor feasible for a sample of over a million articles ranging over 12 years, 12 disciplines, and hundreds of journals -- all showing exactly the same citation advantage for self-archived articles in every year and every discipline.

(3) article type: We are able to test this separately too (because we have ISI data on article type) but first let’s see whether partialling out author numbers makes a dent in our basic effect.

(4) country of the corresponding author: This is testable too, but first let’s see how the author-number ‘confounder’ pans out (we could look at the first-author's birth-sign too...).

(5) funding type: Data not available, and extremely far-fetched.

(6) subject area: Already tested and reported in our data, separately for 12 different disciplines : the self-archiving advantage is consistently present in all of them.

(7) submission track (PNAS has three different ways that authors can submit a paper): Not relevant to the journals we tested, which were all non-OA and pre-dated Open Choice.

(8) previous citation record of the first and last authors: This, as I noted, is -- along with the demonstration of how early the OA advantage emerges in PNAS – a potentially interesting variable in the fine-tuning of the OA advantage, but our own studies are concerned with estimating the generality and size of the OA advantage, not with its fine tuning.

(9) whether authors choosing the OA option in PNAS chose to do so for only their most important research (“they didn't”): Neither Eysenbach’s study nor ours can confirm causality or eliminate the possibility of self-selection bias.

GE: "the fact that we look at a immediate (gold-)OA article population in a longitudinal cohort study design takes care of the “arrow of causation” problem, because it makes sure that open access status comes first, then the citations are coming, not the other way round.

I'm afraid it's not quite that easy to take care of the "arrow of causation" problem, which is confounded (sic) with the problem of self-selection bias: For if authors are (contrary to their subjective reports) indeed self-selecting their better papers (or themselves!) for OA-gold (or for self-archiving) then that, and not the OA, could explain why their papers get more citations.

GE: “it is entirely possible that the articles in his sample (which he refers to as green-OA articles) were not “immediately” self-archived after publication, but 1 month, 6 months, or 12 months after original publication, therefore not really what Harnad refers to as green-OA, implying “immediate” deposition.”

This is actually a valid point of definition: OA should be defined as ‘immediate’ in order to rule out claims that delayed/embargoed access is Open Access. The point at which refereed research can and should begin to be used is when the final refereed draft is accepted for publication, and that is the point when it should be made freely accessible online. So a portion of the citation advantage for self-archived articles could well have come from self-archiving later than the publication date; technically speaking this should be called a ‘free access’ advantage, if we reserve the term "OA" for access that is free immediately. But surely nothing of substance rides on this: If there is a self-archiving advantage even for tardy self-archiving, that confirms, a fortiori, the self-archiving advantage of prompt (OA) self archiving too!

GE: “I… made a conscientious decision to submit my paper to a gold-OA journal (PLoS) rather than publishing the study in an obscure scientrometrics journal and then self-archived [sic] it”

Actually, unless I am mistaken, I seem to recall corresondence from GE to the effect that it was first declined by Science (or was it Nature?) – not a gold-OA journal – before being submitted to PloS Biology)…

GE: “The visibility of an article published in a properly promoted OA journal site will always be better than a paper that is published in a toll-access journal site, even if it is self-archived. This is exactly why my study shows an advantage of gold-OA over green-OA, this is also why I personally chose the gold route to publish this paper in PLoS, and not the green route”

Let us not confound a journal's profile/impact level with its OA/non-OA status.

The visibility (and no doubt also the citation impact) of an article will always be better when it is published in a high-profile, high-impact journal, whether it is OA (like PLoS) or non-OA (like Science or Nature) rather than an obscure scientometrics journal (or an obscure OA journal). Its visibility and impact will be higher if self-archived in either case (except perhaps if the journal is both high-profile and optional-OA, which is partly what Eysenbach’s study has shown).

GE: “the PLoS paper is the first study which contains an analysis of both gold and green (thus focuses on “OA itself”), whereas the rest of the studies is actually focused on ‘green’”.

Because most of the existing data for within-journal OA/non-OA comparisons comes from the millions of articles published in the thousands of non-gold journals indexed by ISI and not just the thousands of articles published in the few journals that are as yet (like PNAS) optional-gold...

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Publishing Reform at 01:31 | Comments (0) | Trackbacks (0)

Wednesday, May 24. 2006

End of PLoS Exchange

Update Jan 1, 2010: See Gargouri, Y; C Hajjem, V Larivière, Y Gingras, L Carr,T Brody & S Harnad (2010) “Open Access, Whether Self-Selected or Mandated, Increases Citation Impact, Especially for Higher Quality Research”
Update Feb 8, 2010: See also "Open Access: Self-Selected, Mandated & Random; Answers & Questions"

PLoS seems to have concluded that it is not in their interest to host further public contributions to the Eysenbach debate from me -- and perhaps they are right (that it is not in their interest)...

From: plos AT plos.org
Date: May 19, 2006 3:13:07 PM EDT (CA)
To: Stevan Harnad harnad AT ecs.soton.ac.uk
Subject: e-Letter: Decision

Unfortunately we decided not to accept your e-Letter. Letters are published at the editors' discretion, and we publish only those that we believe will contribute substantially to the debate. Our editorial decisions about publishing letters are final, and are not open to appeal.

Below is appended the mercilessly compressed fragment that I had submitted to PLoS as a follow-up letter (responding to Eysenbach's PLoS letter responding to my PLoS letter responding to his PLoS article).

The full version of my reply of course appeared on AmSci and in my Archivangelism blog -- but, as Eysenbach's study showed, one gets still further visibility from appearing on the website of a high-profile, high-impact journal! My (valid) rebuttal to Eysenbach's suggestion that self-archiving is to OA publishing as handing out leaflets is to publishing in a newspaper was that we are talking about publications in the case of OA, not unpublished materials! But with a letter, it's more like handing out leaflets when it just appears in a Forum or a Blog, versus the website of a high-impact journal...

Never mind. The content is what matters, and time is on OA's side (even if it is much too dilatory!), because OA is (you've heard the song!): Optimal and Inevitable.

Confirming the Within-Journal OA Impact Advantage

(Click for Fuller version)

Given the large within-journal OA citation impact advantages repeatedly found across all journals, disciplines and years in samples four orders of magnitude larger than Eysenbach's, it is not clear that controls for "multiple confounders" are needed to demonstrate the reality, magnitude and universality of the OA advantage. (This does not mean Eysenbach’s controls are not useful, just that they are not yet telling us much that we don't already know.)

Eysenbach (and PLoS) are focussed on gold-OA journals; most other OA impact studies are focused on OA itself. Only ~10% of journals are gold today. Few as yet offer authors "Open Choice" (allowing gold within-journal OA/NOA comparisons) and few authors are as yet choosing paid OA.

Regarding the “arrow of causation: yes, “longtitudinal cohort” data would demonstrate causation (for skeptics who think the OA advantage might be a self-selection bias) but Eysenbach's author self-reports certainly aren’t such data! Meanwhile: (a) the OA advantage does not diminish for younger articles; (b) OA increases downloads; (c) increased downloads in the first 6 months correlate with increased citations later; (d) unaffordability reduces access; (e) access is a necessary condition for citation.

About OA being a “continuum” or “spectrum”: Time is certainly a continuum, and access certainly admits of degrees (access may be easier/harder, narrower/wider, cheaper/dearer, longer/shorter, earlier/later, partial/full) -- but Open Access does not admit of degrees (any more than pregnancy does). OA is defined as: full-text online access, free for all.

Eysenbach likens self-archiving to “printing something on a flyer and handing it out to pedestrians on the street [instead of] publishing an article in a national newspaper." But it is published articles that are being self-archived.

NOA (Not OA): 1159 articles (86.2% cited at least once)
POA (Payed OA only): 176 (94.3%)
SOA (Self-Archived OA only): 121 (90.1%)
BOA (POA and SOA): 36 (97.2%)

In this PNAS sample, POA, SOA and BOA together, and POA alone, all have significantly more citations than NOA, but SOA alone ("stratified") does not; also, both POA and SOA increase citations, but POA does it more.

Three possible hypotheses explaining the BOA>POA>>SOA>NOA outcome:

H1: The POA advantage might be a multiple-archiving effect, maximal for high-profile , 3-option (POA, SOA, NOA) journals like PNAS because POA articles are more visible than SOA. (POA + SOA = BOA highest of all: redundancy helps!) As Institutional Repositories fill, this extra advantage will disappear.

H2: The POA advantage might arise in part from self-selection because the decision to pay for POA is influenced by the author's sense of the potential importance (hence impact) of his article. (But I think self-selection quality-bias is just one of many contributors to the OA advantage itself, not the only one or the biggest.)

H3: The POA advantage might be either a small-sample chance result or a temporary side-effect of the 3-option journals in early days: a one-stop shopping advantage for PNAS articles, in a high-profile store, today. It needs to be tested for replicability and representativeness in larger samples of articles, journals, and time-bases.

The true measure of the SOA advantage today is not found in PNAS but in the far more populous and representative full spectrum of journals not yet offering POA. (I’d be delighted if those journals took Eysenbach’s findings as a reason for offering a POA option! But not at the expense of authors wrongly inferring that for the journals they currently publish in, SOA alone would not confer citation advantages at least as big as the ones we have been reporting.)

Harnad & Brody (2004)
Brody et al (2005)
Hajjem et al (2005)

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Publishing Reform at 13:52 | Comment (1) | Trackbacks (0)

Thursday, May 18. 2006

Confirming the Within-Journal OA Impact Advantage

Update Jan 1, 2010: See Gargouri, Y; C Hajjem, V Larivière, Y Gingras, L Carr,T Brody & S Harnad (2010) “Open Access, Whether Self-Selected or Mandated, Increases Citation Impact, Especially for Higher Quality Research”
Update Feb 8, 2010: See also "Open Access: Self-Selected, Mandated & Random; Answers & Questions"

Gunther Eysenbach (GE) (in a letter in letter PLoS Biology today) wrote:

GE: "The introduction of the article and two accompanying editorials [1, 2, 3] already answer Harnad's questions why author, editors, and reviewers were critical of the methodology employed in previous studies, which all only looked at "green OA" (self-archived/online-accessible papers)"

I didn't ask why the author and editors were critical of prior self-archiving (green OA) studies; I asked why they said such studies were "surprisingly hard to find" and why the two biggest and latest of them were not even taken into account:

Brody, T., Harnad, S. and Carr, L. (2005) Earlier Web Usage Statistics as Predictors of Later Citation Impact. Journal of the American Association for Information Science and Technology (JASIST) 56.

Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4) pp. 39-47.

And the reason all prior within-journal studies look only at "green OA" is that the majority of OA today is green; hence almost all OA/NOA impact comparisons are based on green OA (self-archiving) rather than on paid-OA (gold). To compare OA and NOA between rather than within journals would be to compare apples and oranges: See critique of ISI's between-journal OA/NOA comparisons in:

Brody, T. and Harnad, S. (2004) Comparing the Impact of Open Access (OA) vs. Non-OA Articles in the Same Journals. D-Lib Magazine 10(6).

GE: " (hint 1: "confounding") (hint 2: arrow of causation: are papers online because they are highly cited, or the other way round?)."

I am afraid I don't see Eysenbach's point here at all: What exactly does he think is being confounded in within-journal comparisons of self-archived versus non-self-archived articles? The paid-OA effect? But among OA articles today there is almost zero within-journal paid-OA, because so few journals offer it! (Hajjem et al.'s within-journal comparisons were based on over a million articles, across 12 years and hundreds of journals, in 12 disciplines! Eysenbach's were based on 1492 articles, in 6 months, in one journal.)

And is Eysenbach suggesting that his failure to find any significant difference among author self-reports -- about their own article's quality and its causal role in their decision about whether or not to pay for OA (or to self-archive) in his sample of 237 authors -- is an objective test of the arrow of causation? (I agree that Eysenbach's failure to find a difference fails to support the hypothesis of a self-selection bias, but surely that won't convince those who are minded to hold that hypothesis! I would welcome rigorous causal evidence against the self-selection hypothesis as much as Eysenbach would, but author self-reports are alas not that evidence!)

GE: " The statement in the PLoS editorial has to be seen against this background. None of the previous papers in the bibliography mentioned by Harnad employed a similar methodology, working with data from a "gold-OA" journal."

Yes, almost all prior studies on the OA impact advantage are based on green OA, not gold, but so what? It is Eysenbach (and PLoS) who are focussed on gold-OA journals; the rest of the studies are focussed on OA itself. Only about 10% of the planet's peer-reviewed journals are gold today, and most of those are 100% gold, hence allow no within-journal comparisons. Very few journals as yet offer authors the "Open Choice" (optional paid gold) that would allow gold within-journal OA/NOA comparisons; and few authors are as yet taking those journals up on it (about 15% in this PNAS sample), compared to the far larger number that are self-archiving (also 15%, as it happens, though that percentage too is still far too small!). The difference in article sample sizes is about four orders of magnitude (c. 1500 articles in Eysenbach's study to 1.5 million in Hajjem et al's).

GE: " The correct method to control for problem 1 (multiple confounders) is multivariate regression analysis, not used in previous studies."

Correct. But with the large, consistent within-journal OA/NOA differences found across al journals, all disciplines and all years in samples four orders of magnitude larger than Eysenbach's, it is not at all clear that controls for those "multiple confounders" are necessary in order to demonstrate the reality, magnitude and universality of the OA advantage. That does not mean the controls are not useful, just that they are not yet telling us much that we don't already know.

GE: " Harnad's statement that "many [of the confounding variables] are peculiar to this particular... study" suggests that he might still not fully appreciate the issue of confounding. Does he suggest that in his samples there are no differences in these variables (for example, number of authors) between the groups? Did he even test for these? If he did, why was this not described in these previous studies?"

No, we did not test for "confounding effects" of number of authors: What confounding effects does Eysenbach expect from controlling for number of authors in a sample of over a million articles across a dozen disciplines and a dozen years all showing the very same, sizeable OA advantage? Does he seriously think that partialling out the variance in the number of authors would make a dent in that huge, consistent effect?

Not that Eysenbach's tentative findings on 1st-author/last-author differences in his one-journal sample of 1492 are not interesting; but those are merely minor differences in shading, compared to the whopping main effect, which is: substantially more citations (and downloads) for self-archived OA articles.

GE: " The correct method to address problem 2 (the "arrow of causation" problem) is to do a longitudinal (cohort) study, as opposed to a cross-sectional study. This ascertains that OA comes first and THEN the paper is cited highly, while previous cross-sectional studies in the area of "green OA" publishing (self-archiving) leave open what comes first -- impact or being online."

I agree completely that time-based studies are necessary to demonstrate causation, for those who think that the OA advantage might be based on self-selection bias (i.e., that high-impact studies tend to be preferentially self-archived, perhaps even after they have gained their high impact), but Eysenbach's author self-report data certainly don't constitute such a longtitudinal cohort study! (Once there exist reliable deposit dates for self-archived articles, we will be able to do some time-based analyses on green OA too, but, frankly, by that time the outcome is likely to be a foregone conclusion.)

In the meanwhile, the fact that (a) the OA advantage does not diminish for younger articles (as one would expect if it were a post-hoc effect), that (b) OA increases downloads, and that (c) increased downloads in the first 6 months are correlated with increased citations later on -- plus the logic of the fact that (d) unaffordability reduces access and that (e) access is a necessary condition for citation -- all suggest that most of the scepticism about the SOA advantage is because of conflicting interests, not because of objective uncertainty.

GE: " Harnad - who usually carefully distinguishes between "green" and "gold" OA publishing -- ignores that open access is a continuum, much as publishing is a continuum"

I'm afraid I have no idea what Eysenbach means about OA being a continuum: Time is certainly a continuum, and access certainly admits of degrees (access may be easier/harder, narrower/wider, cheaper/dearer, longer/shorter, earlier/later, partial/full) -- but Open Access does not admit of degrees (any more than pregnancy does). OA means immediate, permanent, full-text online access, free for all, now.

And, by the way, green OA is certainly not a lesser degree of gold OA!

For the innocent reader, puzzled as to why this would even be an issue:

Please recall that OA (gold) journals, whether total or optional gold, need authors (and those gold journals with the gold cost-recovery model need paying author/institutions). To attract authors, gold journals need to persuade them of the benefits of OA. So far so good. But there is another thing they have to persuade them of, implicitly or explicitly, and that is the benefits of gold OA over green OA. For if there are no benefits of gold over green, then surely it makes much more sense for authors to publish in their journal of choice, as they always did, and simply self-archive their own articles, rather than switching journals and/or paying for gold OA!

This theme alas keeps recurring, implicitly or explicitly, in the internecine green/gold squabbles, because green OA is indeed a rival to gold OA in gold OA journals' efforts to win over authors. This is regrettable, but a functional fact today, owing to the nature of OA and of the two means of providing it.

Is the effect symmetrical? Is gold OA likewise a rival to green OA? Here the answer is more complicated: No, an author who chooses gold OA (by publishing in an OA journal) is not at all a loss for green OA, because the article is nevertheless OA, and green OA's sole objective is 100% OA, as soon as possible, and nothing else. (Besides, a gold OA article too can be self-archived in the author's Institutional Repository if the author or institution wishes! All gold journals are, a fortiori, also green, in that they endorse author self-archiving.)

But there is a potential problem with gold from the standpoint of green. The problem is not with authors choosing gold. The problem is with gold publishers promoting gold as superior to green, or, worse, with gold publishers implying that green OA is not really OA, or not "fully" OA (along some imaginary OA "continuum").

"Free Access vs. Open Access" (thread started Aug 2003)

Why, you ask, would gold OA want to give the impression that green OA was not "really" OA or not "fully" OA? Because of the rivalry for authors that I just mentioned. The causal arrow is a one-way one insofar as competition for authors is concerned: green OA does not lose an author if that author publishes in a gold OA journal, whereas gold OA does lose an author if an author publishes in a green journal instead of a gold one. However, if gold portrays green as if it were not really or fully OA, and authors believe this, then it loses author momentum for green -- especially among that vast majority of authors who do not yet elect to publish gold. For there is today something still very paradoxical, indeed equivocal, about author behavior and motivation vis-a-vis OA:

Authors profess to want OA. Thirty-four thousand of them even signed the 2001 PLoS Open Letter threatening to boycott their journals if they did not provide (gold) OA (within 6 months of publication). (Most journals did not comply, and most authors did not follow through on their boycott threat: How could they? There were not enough suitable gold journals for them to switch to, and most authors clearly were not interested in switching journals, let alone paying for publication, then or now.)

Yet (and here comes the paradox): if those 34,000 signatories -- allegedly so desirous of OA as to be ready to boycott their journals if they did not provide it -- had simply gone on to self-archive all their papers, they would be well on the road to having the OA they allegedly desired so much! For the green road to 100% OA happens to be based on the (golden!) rule: Self-Archive Unto Others As You Would Have Them Self-Archive Unto You.

Why didn't (and don't) most authors do it (yet)? It is partly (let us state it quite frankly) straightforward foolishness and inconsistency on authors' part. They simply have not thought it through. This cannot be denied. Authors are in a state of self-induced "Zeno's Paralysis" regarding OA, from which FAQs have so far been powerless to free them -- so that it now looks as if self-archiving mandates from their institutions and/or their funders will be the only thing that can induce them to do what will give them what they so want and need.

But the confusion and inaction are partly also the fault of the promotional efforts of (well-meaning) OA advocates. Harold Varmus sent a mixed message with his 1999 "E-biomed" proposal (which led to PLoS, the PLoS Open letter, PubMed Central, Biomed Central, and eventually the PLoS and BMC fleet of OA journals, including PLoS Biology). Was E-biomed a gold proposal, a green proposal, both, or neither? The fact is that it was an incoherent proposal -- a confused and confusing mish-mash of central self-archiving, publishing reform/replacement and rival publishing -- and although it has undeniably led to genuine and valuable progress toward (what was eventually baptized by BOAI as) OA, it has left a continuing legacy of continuing confusion too.

And we are facing part of that legacy of confusion now, with PLoS thinking that the only way (or the best) to reach 100% OA is to publish and promote gold OA journals. That is why PLoS Biology agreed to referee the Eysenbach paper, which seemed to show that OA gold is the only one that increases citation impact, not green self-archiving, which is (when you come right down to it) not even "real" OA at all!

That is also why PLoS Biology editorialised that they found it "surprisingly hard to find" evidence -- "solid evidence" -- that OA articles are read and cited more. And that is why PLoS Biology was happy to make an exception and publish the Eysenbach study, even though scientometrics is not the subject matter of PLoS Biology, but (I'll warrant) PLoS Biology would not have been happy to advertise in its pages the fact that green OA self-archiving was enough to get articles read and cited more!

So green OA does have a bit of an uphill battle against gold OA and the subsidies and support it has received (because gold OA is an attractive and understandable idea, whereas green OA requires a few more mental steps to dope out -- though not many, as none of this is rocket science!).

But, to switch metaphors, the green road to 100% OA (sic) is far wider, faster and surer than the golden road. (Every article can be self-archived, today, and without their authors' having to renounce or switch journals, whereas most articles do not yet have a suitable OA journal to publish in today, even if their authors wished to switch journals, which most do not; and authors can be mandated to self-archive by their institutions and funders, but neither authors' choice of journals nor their publishers' choice of access-provision or cost-recovery model can be mandated by authors' institutions and funders.) Moreover, 100% OA really is beneficial to research and researchers; so the green road of self-archiving is bound to prevail, despite the extra obstacles. And the destination (100% OA) is exactly the same for both roads. (Indeed, I am pretty sure that even the fastest way to reach 100% gold OA -- i.e., not just 100% OA but also the conversion of all journals to gold -- is in fact to take the green road to 100% OA first.

So gold is doing itself a disservice when it tries to devalue green. Read on:

GE: " and this study (and the priority claims in the editorial) was talking about the gold OA end of the spectrum."

Spectrum? Continuum? Degrees of OA?

GE: " Publishing in an open access journal is a fundamentally different process from putting a paper published in a toll-access journal on the Internet. In analogy, printing something on a flyer and handing it out to pedestrians on the street, and publishing an article in a national newspaper can both be called "publishing", but they remain fundamentally different processes, with differences in impact, reach, etc. A study looking at the impact of publishing a newspaper can not be replaced with a study looking at the impact of handing out a flyer to pedestrians, even though both are about "publishing"."

Oh dear! I have a feeling Eysenbach is going to tell as that making a published journal article accessible online free for all by self-archiving it is not OA after all, or not "full OA". If the journal doesn't do it for you, and/or you don't pay for it, it's not the real thing.

I wonder why Eysenbach would want to say that? Could it be because he is promoting an OA (gold) journal (his own)? Could that also have been the reason the PLoS editorial was so sanguine about Eysenbach's findings on the OA gold advantage, and so dismissive of any prior evidence of an OA green advantage?

GE: " Finally, Harnad says that "prior evidence derived from substantially larger and broader-based samples showing substantially the same outcome". I rebut with two points here[:] Regarding "larger samples" I think rigor and quality (leading to internal validity) is more important than quantity (or sample size)."

Even when all within-journal studies -- large and small, approximate and exact -- just keep producing exactly the same outcome, every time (OA increases impact)?

GE: " Going through the laborious effort to extract article and author characteristics for a limited number of articles (n = 1492) in order to control for these confounders provides scientifically stronger evidence than doing a crude, unadjusted analysis of a huge number of online accessible vs non-online accessible articles, leaving open many alternative explanations."

As I said, for those who doubt the causality and think the OA advantage is just a self-selection bias, Eysenbach's study will not convince them otherwise either. For those with eyes to see, the repeated demonstrations, in field after field, of exactly the same effect on incomparably larger samples will already have been demonstration enough. For those with eyes only for gold, evidence that green enhances citations will never be "solid evidence."

If Eysenbach and the editors had portrayed the latest PLoS findings as they should have, namely, as yet another confirmation of the OA impact advantage, with some new details about its fine-tuning, I would have done had nothing but praise for it. But the actual self-interested spin and puffery that instead accompanied this work -- propagating the frankly false idea that this is the first "solid evidence" for the OA impact advantage, and, worse, that it implies that self-archiving itself does not deliver the OA impact advantage -- would have required not the lack of an ego, but the lack of any real fealty to OA itself to have been allowed to stand uncontested.

GE: " Secondly, contrary to what Harnad said, this study is NOT at all "showing substantially the same outcome". On the contrary, the effect of green-OA -- once controlled for confounders - was much less than what others have claimed in previous papers."

Let's be quite explicit about what, exactly, we are discussing here:

Eysenbach found that in a 6-month sample of 1492 articles in one 3-option journal (PNAS):

"While in the crude analysis self-archived papers had on average significantly more citations than non-self-archived papers (mean, 5.46 versus 4.66; Wilcoxon Z = 2.417; p = 0.02), these differences disappeared when stratified for journal OA status (p= 0.10 in the group of articles published originally as non-OA articles, and p = 0.25 in the group of articles published originally as OA).

"In a logistic regression model with backward elimination, which included original OA status and self-archiving OA status as separate independent variables as well as all potential confounders, self-archiving OA status did not remain a significant predictor for being cited. In a linear regression model, the influence of the covariate "article published originally as OA, without being self-archived" (beta = 0.250, p < 0.001) on citations remained stronger than self-archiving status (beta = 0.152, p = 0.02)."

To translate this into english (from an article with exceedingly user-unfriendly data-displays, by the way, making it next to impossible to extract and visualize results from the tables by inspection!): First, the numbers:

NOA (Not OA): (1159 articles 86.2% cited at least once)
POA (Payed OA only): (176 articles 94.3% cited at least once)
SOA (Self-Archived OA only): (121 articles 90.1% cited at least once)
BOA (POA and SOA): ( 36 articles 97.2% cited at least once)

The finding is that (in this PNAS sample, and with many other factors -- e.g., days since publication, number of authors, article type, country, funding, subject, etc. -- statistically isolated so as to be asessable independently): POA, SOA and BOA considered together, and PAO considered alone, all have significantly more citations than NOA; but SOA considered alone ("stratified") does not. Also, if considered jointly (multiple regression), both POA and SOA increase citations, but POA is the stronger effect.

Here are three simple hypotheses, in decreasing order of likelihood, as to why this small PNAS study may have found that the citation counts and their significance ordered themselves as they did: BOA>POA>>SOA>NOA

Hypothesis 1: The POA advantage might be unique to high-profile 3-option journals (POA, SOA, NOA) like PNAS (which are themselves a tiny minority among journals) and occurs because the POA articles are more visible than the SOA articles. (The POA + SOA = BOA articles do the best of all: redundancy enhances visibility.) So the POA authors do get something more for their money (but that something is not OA but high-profile POA in a high-profile journal) -- at least for the time being. This extra POA-over-SOA advantage will of course wash out as SOA and indexed, interoperable Institutional Repositories for self-archiving grow.

Hypothesis 2: The POA advantage might result at least in part from QB (self-selection Quality Bias) because the decision (by a self-selected 15% subset of PNAS authors) to pay for POA is influenced by the author's underlying sense of the potential importance (hence impact) of his article: Simply asking authors about how important they think their article is, and whether that influenced their decision to pick POA or SOA or NOA, and failing to detect any significant difference among the authors, does not settle this matter, and certainly not on the basis of such a small and special sample. (But I think QB is just one of many contributors to the OA citation advantage itself, and certainly not the only determinant or even the biggest one.)

Hypothesis 3: The POA advantage might be either a small-sample chance result or a temporary side-effect of the 3-option journals in early days: a one-stop shopping advantage for PNAS articles, in a high-profile store, today. It needs to be tested for replicability and representativeness in larger samples of articles, journals, and time-bases.

(Note that Lawrence's 2001 as well as Hajjem et al's 2005 finding had been that the proportion of OA articles increases in the higher citation ranges, being lowest among articles with 0-1 citations.)

Eysenbach finds that with logistic regression analysis separating the independent effects of POA, SOA and other correlates, SOA has no significant independent effect in his 1-year PNAS sample. Now let's test whether that replicates in larger samples, both in terms of number of articles, journals, and time-base. (Failure to find a significant effect in a small sample is far less compelling than success in finding a significant effect in a small sample!)

GE: " Harnad, a self-confessed "archivangalist", co-creator of a self-archiving platform, and an outspoken advocate of self-archiving (speaking of vested interests) calls the finding that self-archived articles are... cited less often than [gold] OA articles from the same journal "controversial". In my mind, the finding that the impact of nonOA < greenOA < goldOA < green+goldOA is intuitive and logical: The level of citations correlates with the level of openness and accessibility."

I don't dispute that POA can add more citations, just as BOA can; maybe self-archiving in 10 different places will add still more. But what does this imply, right now, practically speaking? And, even more important, how likely is it that this sort of redundancy will continue to confer significant citation advantages once a critical mass of the literature is in interoperable Institutional Repositories (green SOA) rather than few and far between, as now? It is indeed intuitive and logical that the baseline 15% of the literature as a whole that is being spontaneously self-archived somewhere, somehow on the Web, across all fields, has somewhat less visibility right now than the 15% of PNAS articles that PNAS is making OA for those authors who pay for it (POA). That's a one-stop shopping advantage for PNAS articles, against PNAS articles, in a high-profile store, today.

But the true measure of the SOA advantage today (at its 15% spontaneous baseline) is surely not to be found in PNAS but in the statistically far more numerous, hence far more representative full-spectrum of journals that do not yet offer POA. (I would be delighted if those journals took the Eysenbach findings as a reason for offering a POA option! But not at the expense of authors drawing the absurd conclusion -- not at all entailed by Eysenbach's PNAS-specific results -- that in the journals they currently publish in, SOA alone would not confer citation advantages at least as big as the ones we have been reporting.)

Regarding my self-confessed sin of archivanglizing, however, I do protest that my first and only allegiance is to 100% OA, and I evangelize the green road (and promote the self-archiving software) only because it is so resoundingly obvious that it is the fastest and surest road to 100% OA. (If empirical -- or logical -- evidence were ever to come out showing the contrary, I assure you I too would join the gold rush!)

GE: " Sometimes our egos stand in the way of reaching a larger common goal, and I hope Harnad and other sceptics respond with good science rather than with polemics and politics to these findings."

Well, first, let us not get carried away: There's precious little science involved here (apart from the science we are trying to provide Open Access to). The call to self-archive in order to enhance access and impact is so obvious and trivial that, as I noted, the puzzle is only why anyone would even have imagined otherwise.

But when it comes to polemics and politics (and possibly also egos), it might have kept things more objective if the results of Eysenbach's small but welcome study confirming the OA impact advantage had not been hyped with editorial salvos such as:

"solid evidence to support or refute... that papers freely available in a journal will be more often read and cited than those behind a subscription barrier... has been surprisingly hard to find..."
Or even the heavily-hedged:
"As far as we are aware, no other study has compared OA and non-OA articles from the same journal and controlled for so many potentially confounding factors."

GE: " Unfortunately, in this area a lot more people have strong opinions and beliefs than those having the skills, time, and willingness to do rigorous research. I hope we will change this, and I reiterate a "call for papers" in that area [http://www.jmir.org/2006/2/e8/]"
May I echo that call, adding only that the rigorous research might perhaps be better placed in a journal specializing in scientometrics and in rigorously peer-reviewing it, rather than in The Journal of Medical Internet Research, or even PLoS Biology.
Brody, T., Harnad, S. and Carr, L. (2005) Earlier Web Usage Statistics as Predictors of Later Citation Impact. Journal of the American Association for Information Science and Technology (JASIST) 56.

Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4) pp. 39-47.
I close with some replies to portions of another version of Eysenbach's response which appeared in his blog.
GE: " Harnad's point that the PLoS paper is about the "citation advantage of open access" and that there have been "previous papers about the citation advantage of open access" (mostly his own studies, mostly not published in peer-reviewed journals) is as meaningful as saying "this paper is about a cancer treatment, and there are previous papers about cancer treatments, so this one doesn't add anything"."
That's not what I said. I said this:
"[T]he only new knowledge from this small, journal-specific sample was (1) the welcome finding of how early the OA advantage can manifest itself, plus (2) some less clear findings about differences between first- and last-author OA practices, plus (3) a controversial finding that will most definitely need to be replicated on far larger samples in order to be credible: "The analysis revealed that self-archived articles are also cited less often than OA [sic] articles from the same journal."
And I do think all of this is as far away from rigorous oncological research as it is from rocket science!
GE: " The statement made by the reviewers and editors of the PLoS paper that this is the first study looking at the citation advantage of an open access/hybrid journal remains correct until somebody can show me a reference where this has been done before."
But who ever contested that far more modest and circumspect statement (which was certainly not the one the accompanying PLoS editorial made)? This is indeed "the first study looking at the citation advantage of an open access/hybrid journal"; indeed, it's the first such study of PNAS. But it's certainly not the first study looking at the citation advantage of OA in general, or OA self-archiving in particular, and looking at it within journals -- within many journals, and many articles.
GE: " In analogy, a small carefully designed cohort study showing a relationship between smoking and cancer with 1500 patients, obtaining through questionnaires and interviews additional variables which could account for the association and controlling for these confounders and still coming to the conclusion that there is a relation between smoking and cancer is scientifically stronger evidence than a quick-and-dirty uncontrolled cross-sectional study showing an association between smoking and cancer, even if this is done in a population of millions."
Indeed it would. And I forgot to add to my list (4) that Eysenbach had tested the hypothesis that the OA citation advantage is merely the result of a self-selection bias by asking 247 authors whether it was, and they replied that it wasn't...

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Publishing Reform at 21:59 | Comments (0) | Trackbacks (0)

Tuesday, May 16. 2006

Within-Journal Demonstrations of the Open Access Impact Advantage

Update Jan 1, 2010: See Gargouri, Y; C Hajjem, V Larivière, Y Gingras, L Carr,T Brody & S Harnad (2010) “Open Access, Whether Self-Selected or Mandated, Increases Citation Impact, Especially for Higher Quality Research”
Update Feb 8, 2010: See also "Open Access: Self-Selected, Mandated & Random; Answers & Questions"

PLoS, Pipe-Dreams and Peccadillos

(Shorter version of this comment appears as a letter in PLoS Biology)

ABSTRACT: Eysenbach's (2006) study in PloS Biology on 1492 articles published during one 6-month period in one journal (PNAS) found that the Open Access (OA) articles were more cited than the non-OA ones. The online bibliography on the OA citation advantage records a number of prior within-journal comparisons that found exactly the same effect: freely available articles are read and cited more. Eysenbach?s further finding that the OA advantage (in this particular 6-month, 3-option, 1-journal PloS/PNAS study) is greater for articles that have paid for OA publication than for those that have merely been self-archived will require replication on much larger samples as most of the prior evidence for the OA advantage comes from self-archived articles and is based on sample sizes four orders of magnitude larger for both the number of articles and the number of journals tested.

I applaud and welcome the results of the Eysenbach (2006) study on 1492 articles published during one 6-month period in one journal (PNAS), showing that the Open Access (OA) articles were more cited than the non-OA ones. I also agree fully that the findings are unlikely to have been an artifact of PLoS’s “strong and vested interest in publishing results that so obviously endorse our existence,” nor of the fact that “the author of the article is also an editor of an open-access journal” (all quotes are from the PloS Biology editorial by MacCallum & Parthasarthy, 2006).

However, I am less sure that PloS’s and the author’s vested interests are not behind statements (in both the accompanying editorial and the article itself) along the lines that: “solid evidence to support or refute… that papers freely available in a journal will be more often read and cited than those behind a subscription barrier… has been surprisingly hard to find.” The online bibliography on ‘The effect of open access and downloads ('hits') on citation impact’ records a growing number of studies reporting precisely such evidence as of 2001, including studies based on data from much larger samples of journals, disciplines and years than the PloS study on PNAS– and they all find exactly the same effect: freely available articles are read and cited more.

There can be disagreement about what evidence one counts as “solid,” but there can be little dispute that prior evidence derived from substantially larger and broader-based samples showing substantially the same outcome can hardly be described as “surprisingly hard to find.”

In fact, the only new knowledge from this small, journal-specific sample was (1) the welcome finding of how early the OA advantage can manifest itself, plus (2) some less clear findings about differences between first- and last-author OA practices, plus (3) a controversial finding that will most definitely need to be replicated on far larger samples in order to be credible: “The analysis revealed that self-archived articles are also cited less often than OA [sic] articles from the same journal.”

The latter (3) is a within-journal (one journal, PNAS) finding; the overwhelming majority of articles made OA (sic) through author self-archiving today (on which the prior large-sample OA citation advantage findings are based) do not appear in journals with a paid-OA option. Hence on the present evidence I have great difficulty in seeing this secondary advantage as any more than a paid-OA publisher’s pipe-dream at this point.

The following, however, is not a pipe-dream, but a peccadillo: “no other study has compared OA and non-OA articles from the same journal.” To be fair, this observation is hedged with “[a]s far as we are aware” (but the OA-advantage bibliography is surely public knowledge – or should be among advocates of public access to science) and the observation is further qualified with: “and [also] controlled for so many potentially confounding factors.”

But it has to be stated that of these “potentially confounding” variables -- “number of days since publication, number of authors, article type, country of the corresponding author, funding type, subject area, submission track (PNAS has three different ways that authors can submit a paper)… previous citation record of the first and last authors… [and] whether authors choosing the OA option in PNAS chose to do so for only their most important research (they didn't)” – many are peculiar to this particular short-interval, 3-option, single-journal PloS study. And several of them (country, subject, year) had already been analyzed in papers that had been published before this 2006 article and were not taken into account despite the fact that both their preprints and their postprints had been freely accessible since well before publication, and that at least one of them (Brody et al. 2005) had been explicitly drawn to the author’s attention based on a preprint draft well before the article was submitted to PloS.

Brody et al. (2005) had found that, alongside the OA citation advantage, more downloads in the first six months after publication are correlated with more citations 18 months later in physics; and Hajjem et al. (2005) had found higher citations for OA articles -- comparing always within the very same journal and year -- for 1,307,038 articles published across 12 years (1992-2003) in 10 disciplines (Biology, Psychology, Sociology, Health, Political Science, Economics, Education, Law, Business, Management).

REFERENCES

Brody, T., Harnad, S. and Carr, L. (2005) Earlier Web Usage Statistics as Predictors of Later Citation Impact. Journal of the American Association for Information Science and Technology (JASIST) 56.

Eysenbach, G, (2006) Citation Advantage of Open Access Articles. PLoS Biology 4(5).

Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4) pp. 39-47.

MacCallum, C.J., and Parthasarathy, H. (2006) Open Access Increases Citation Rate. PLoS Biology 4(5).

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Publishing Reform at 15:18 | Comments (0) | Trackbacks (0)

(Page 1 of 1, totaling 4 entries)

Entries from May 2006

Friday, May 26. 2006

The Epidemiology of OA

Wednesday, May 24. 2006

End of PLoS Exchange

Thursday, May 18. 2006

Confirming the Within-Journal OA Impact Advantage

Tuesday, May 16. 2006

Within-Journal Demonstrations of the Open Access Impact Advantage

EnablingOpenScholarship (EOS)

Federal Research Public Access Act (FRPAA)

Alliance for Taxpayer Access (ATA)

Creative Commons License:

Quicksearch

Syndicate This Blog

Materials You Are Invited To Use To Promote OA Self-Archiving:

Archives

Calendar

Categories

Blog Administration

Statistics

Top Referrers

Syndicate This Blog