Open Access Archivangelism

Quicksearch

Your search for kurtz returned 21 results:

Saturday, May 26. 2007

Craig et al.'s Review of Studies on the OA Citation Advantage

Update Jan 1, 2010: See Gargouri, Y; C Hajjem, V Larivière, Y Gingras, L Carr,T Brody & S Harnad (2010) “Open Access, Whether Self-Selected or Mandated, Increases Citation Impact, Especially for Higher Quality Research”
Update Feb 8, 2010: See also "Open Access: Self-Selected, Mandated & Random; Answers & Questions"

SUMMARY: The thrust of Craig et al.'s critical review (which was proposed by the Publishing Research Consortium and conducted by the staff of three publishers) is that despite the fact that virtually all studies comparing the citation counts for OA and non-OA articles keep finding the OA citation counts to be higher, it has not been proved beyond a reasonable doubt that the relationship is causal.
   I agree: It is merely highly probable, not proved beyond a reasonable doubt. And I also agree that not one of the studies done so far is without some methodological flaw that could be corrected. But it is also highly probable that the results of the methodologically flawless versions of all those studies will be much the same as the results of the current studies. That's what happens when you have a robust major effect, detected by virtually every study, and only ad hoc methodological cavils and special pleading to rebut each of them with. Here is a common sense overview:
(1) Research quality is a necessary, but not a sufficient condition for citation impact: The research must also be accessible to be cited.
(2) Research accessibility is a necessary but not a sufficient condition for citation impact: The research must also be of sufficient quality to be cited.
(3) The OA impact effect is the finding that an article's citation counts are positively correlated with the probability that that article has been made OA: The more an article's citations, the more likely that that article has been made OA.
(4) That correlation has at least three (compatible) causal interpretations:
   (4a) OA articles are more likely to be cited.
   (4b) More-cited articles are more likely to be made OA.
   (4c) A third factor makes it more likely that certain articles will be both more cited and made OA.
(5) Each of these causal interpretations is probably correct, and hence a contributor to the OA impact effect:
   (5a) The better the article, the more likely it is to be cited, hence the more citations it gains if it is made more accessible (4a). (OA Article Quality Advantage, QA)
   (5b) The better the article, the more likely it is to be made OA (4b). (OA Article Quality Bias, QB)
   (5c) 10% of articles (and authors) receive 90% of citations. The authors of the better articles know they are better, and hence are more likely both to be cited and to make their articles OA, so as to maximize their visibility, accessibility and citations (4c). (OA Author QB and QA)
(6) In addition to QB and QA, there is an OA Early Access effect (EA): providing access earlier increases citations.
(7) The OA citation studies have not yet isolated and estimated the relative sizes of each of these (and other) contributing components. (OA also gives a Download Advantage (DA), and downloads are correlated with later citations; OA articles also have a Competitive Advantage (CA), but CA will vanish -- along with QB -- when all articles are OA).
(8) But the handwriting is on the wall as to the benefits of making articles OA, for those with eyes to see, and no conflicting interests to blind them.
   Given all of this, here is a challenge for Craig et al: Instead of striving, like OJ Simpson's Dream Team, only to find flaws in the positive evidence for the OA impact differential, which is equally compatible with either interpretation (OA causes higher citations or higher citations cause OA) why don't Craig et al. do a simple study of their own? Since it is known that (in science) the top 10% of articles published receive 90% of the total citations made, why not test whether and to what extent the top 10% of articles published is over-represented among the c. 15% of articles that are being spontaneously made OA by their authors today? It is, after all, a logical possibility that all or most of the top 10% are already among the 15% that are being made OA: I think it's improbable; but it may repay Craig et al's effort to check whether it is so.
   For if it did turn out that all or most of the top-cited 10% of articles are already among the c.15% of articles that are already being made OA, then reaching 100% OA would be far less urgent and important than I have been arguing, and OA mandates would likewise be less important. I for one would no longer find it important enough to archivangelize if I knew it was just for the bottom 90% of articles, the top 10% of articles having already been self-archived, spontaneously and sensibly, by their top 10% authors without having to be mandated. But it is Craig et al. who think this is closer to the truth, not me. So let them go out and demonstrate it.

I've read Craig et al.'s critical review concerning the OA citation impact effect and will shortly write a short, mild review. But first here is Sally Morris's posting announcing Craig et al's review, on behalf of the Publishing Research Consortium (which "proposed" the review), followed by a commentary from Bruce Royan on diglib, a few remarks from me, then commentary by JWT Smith on jisc-repositories, followed by my response, and, last, a commentary by Bernd-Christoph Kaemper on SOAF, followed by my response.

Sally Morris (Publishing Research Consortium):
Craig, Ian; Andrew Plume, Marie McVeigh, James Pringle & Mayur Amin (2007) Do Open Access Articles Have Greater Citation Impact? A critical review of the literature. Journal of Informetrics.
A new, comprehensive review of recent bibliometric literature finds decreasing evidence for an effect of 'Open Access' on article citation rates. The review, now accepted for publication in the Journal of Informetrics, was proposed by the Publishing Research Consortium (PRC) and is available at its web site at www.publishingresearch.net. It traces the development of this issue from Steve Lawrence's original study in Nature in 2001 to the most recent work of Henk Moed and others.

Researchers have delved more deeply into such factors as 'selection bias' and 'early view' effects, and began to control more carefully for the effects of disciplinary differences and publication dates. As they have applied these more sophisticated techniques, the relationship between open access and citation, once thought to be almost self-evident, has almost disappeared.

Commenting on the paper, Lord May of Oxford, FRS, past president of the Royal Society, said 'In December 2005, the Royal Society called for an evidence-based approach to the scholarly communications debate. This excellent paper demonstrates that there is actually little evidence of a citation advantage for open access articles.'

The debate will certainly continue, and further studies will continue to refine current work. The PRC welcomes this discussion, and hopes that this latest paper may be a catalyst for a new round of informed scholarly exchange.

Sally Morris on behalf of the Publishing Research Consortium

Bruce Royan wrote on diglib:

Sally claims that according to this article "the relationship between open access and citation, once thought to be almost self-evident, has almost disappeared."

Now I'm no Informetrician, but my reading of the article is that the authors reluctantly acknowledge that Open Access articles do have greater citation impact, but claim that this is less because they are Open Access per se, and more because:
-they are available sooner than more conventionally published articles, or

-they tend to be better articles, by more prestigious authors
Sally's point of view is understandable, since she is employed by a consortium of conventional publishers. It's interesting to note that the employers of the authors of this article are Wiley-Blackwell, Thomson Scientific, and Elsevier.

Even more interesting is that, though this article has been accepted for publication in the conventional "Journal of Informetrics", a pdf of it (described as a summary, but there are 20 pages in JOI format, complete with diagrams, references etc) has already been mounted on the web for free download, in what might be mistaken for an example of green route open access.

Could this possibly be in order to improve the article's impact?

Professor Bruce Royan,
Concurrent Computing Limited.

It is notoriously tricky (at least since David Hume) to "prove" causality empirically. The thrust of the Craig et al. critique is that despite the fact that virtually all studies comparing the citation counts for OA and non-OA articles keep finding the OA citation counts to be higher, it has not been proven beyond a reasonable doubt that the relationship is causal.

I agree: It is merely highly probable, not proven beyond a reasonable doubt, that articles are more cited because they are OA, rather than OA merely because they are more cited (or both OA and more cited merely because of a third factor).

And I also agree that not one of the studies done so far is without some methodological flaw that could be corrected.

But it is also highly probable that the results of the methodologically flawless versions of all those studies will be much the same as the results of the current studies. That's what happens when you have a robust major effect, detected by virtually every study, and only ad hoc methodological cavils and special pleading to rebut each of them with.

But I am sure those methodological flaws will not be corrected by these authors, because -- OJ Simpson's "Dream Team" of Defense Attorneys comes to mind -- Craig et al's only interest is evidently in finding flaws and alternative explanations, not in finding out the truth -- if it goes against their client's interests...

Iain D.Craig: Wiley-Blackwell
Andrew M.Plume, Mayur Amin: Elsevier
Marie E.McVeigh, James Pringle: Thomson Scientific

Here is a preview of my rebuttal. It is mostly just common sense, if one has no conflict of interest, hence no reason for special pleading and strained interpretations:

(1) Research quality is a necessary, but not a sufficient condition for citation impact: The research must also be accessible to be cited.

(2) Research accessibility is a necessary but not a sufficient condition for citation impact: The research must also be of sufficient quality to be cited.

(3) The OA impact effect is the finding that an article's citation counts are positively correlated with the probability that that article has been made OA: The more an article's citations, the more likely that that article has been made OA.

(4) This correlation has at least three causal interpretations that are not mutually exclusive:

(4a) OA articles are more likely to be cited.

(4b) More-cited articles are more likely to be made OA.

(4c) A third factor makes it more likely that certain articles will be both more cited and made OA.

(5) Each of these causal interpretations is probably correct, and hence a contributor to the OA impact effect:

(5a) The better the article, the more likely it is to be cited, hence the more citations it gains if it is made more accessible (4a). (OA Article Quality Advantage, QA)

(5b) The better the article, the more likely it is to be made OA (4b). (OA Article Quality Bias, QB)

(5c) 10% of articles (and authors) receive 90% of citations. The authors of the better articles know they are better, and hence are more likely both to be cited and to make their articles OA, so as to maximize their visibility, accessibility and citations (4c). (OA Author QB and QA)

(6) In addition to QB and QA, there is an OA Early Access effect (EA): providing access earlier increases citations.

(7) The OA citation studies have not yet isolated and estimated the relative sizes of each of these (and other) contributing components. (OA also gives a Download Advantage (DA), and downloads are correlated with later citations; OA articles also have a Competitive Advantage (CA), but CA will vanish -- along with QB -- when all articles are OA).

(8) But the handwriting is on the wall as to the benefits of making articles OA, for those with eyes to see, and no conflicting interests to blind them.

I do agree completely, however, with erstwhile (Princetonian and) Royal Society President Bob May's slightly belated call for "an evidence-based approach to the scholarly communications debate."

John Smith (JS) wrote in jisc-repositories:

I wonder if we can come at this discussion concerning the impact of OA on citation counts from another angle? Assuming we have a traditional academic article of interest to only a few specialists there is a simple upper bound to the number of citations it will have no matter how accessible it is.

That is certainly true. It is also true that 10% of articles receive 90% of the citations. OA will not change that ratio, it will simply allow the usage and citations of those articles that were not used and cited because they could not be accessed to rise to what they would have been if they could have been used and cited.

JS: Also, the majority of specialist academics work in educational institutions where they have access to a wide range of paid for sources for their subject.

OA is not for those articles and those users that already have paid access; it is for those that do not. No institution can afford paid access to all or most of the 2.5 million articles published yearly in the world's 24,000 peer-reviewed journals, and most institutions can only afford access to a small fraction of them.

OA is hence for that large fraction (the complement of the small fraction) of those articles that most users and most institutions cannot access. The 10% of that fraction that merit 90% of the citations today will benefit from OA the most, and in proportion to their merit. That increase in citations also corresponds to an increase in scholarly and scientific productivity and progress for everyone.

JS: Therefore any additional citations must mainly come from academics in smaller institutions that do not provide access to all relevant titles for their subject and/or institutions in the poorer countries of the world.

It is correct that the additional citations will come from academics at the institutions that cannot afford paid access to the journals in which the cited articles appeared. It might be the case that the access denial is concentrated in the smaller institutions and the poorer countries, but no one knows to what extent that is true, and one can also ask whether it is relevant. For the OA problem is not just an access problem but an impact problem. And the research output of even the richest institutions is losing a large fraction of its potential research impact because it is inaccessible to the fraction to whom it is inaccessible, whether or not that missing fraction is mainly from the smaller, poorer institutions.

JS: Should it not be possible therefore to examine the citers to these OA articles where increased citation is claimed and show they include academics in smaller institutions or from poorer parts of the world?

Yes, it is possible, and it would be a good idea to test the demography of access denial and OA impact gain. But, again, one wonders: Why would one assign this question of demographic detail a high priority at this time, when the access and impact loss have already been shown to be highly probable, when the remedy (mandated OA self-archiving) is at hand and already overdue, and when most of the skepticism about the details of the OA impact advantage comes from those who have a vested interest in delaying or deterring OA self-archiving mandates from being adopted?

(It is also true that a portion of the OA impact advantage is a competitive advantage that will disappear once all articles are OA. Again, one is inclined to reply: So what?)

This is not just an academic exercise but a call to action to remedy a remediable practical problem afflicting research and researchers.

JS: However, even if this were done and positive results found there is still another possible explanation. Items published in both paid for and free form are indexed in additional indexing services including free services like OAIster and CiteSeer. So it may be that it is not the availability per se that increases citation but the findability? Those who would have had access anyway have an improved chance of finding the article. Do we have proof that the additional citers accessed the OA version (assuming there is both an OA and paid for version)?

Increased visibility and improved searching are always welcome, but that is not the OA problem. OAIster's usefulness is limited by the fact that it only contains the c. 15% of the literature that is being self-archived spontaneously (i.e., unmandated) today. Citeseer is a better niche search engine because computer scientists self-archive a much higher proportion of their research. But the obvious benchmark today is Google Scholar, which is increasingly covering all cited articles, whether OA or non-OA. It is in vain that Google Scholar enhances the visibility of non-OA articles for those would-be users to whom they are not accessible. Those users could already have accessed the metadata of those articles from online indices such as Web of Science or PubMed, only to reach a toll-access barrier when it came to accessing the inaccessible full-text corresponding to the visible metadata.

JS: It is possible that my queries above have already been answered. If so a reference to the work will suffice as a response.

I am a supporter of OA but also concerned that it is not falsely praised. If it is praised for some advantage and that advantage turns out not to be there it will weaken the position of OA proponents.

Accessibility is a necessary (but not a sufficient) condition for usage and impact. There is no risk that maximising accessibility will fail to maximise usage and impact. The only barrier between us and 100% OA is a few keystrokes.

It is appalling that we continue to dither about this; it is analogous to dithering about putting on (or requiring) seat-belts until we have made sure that the beneficiaries are not just the small and the poor, and that seat-belts do not simply make drivers more safety-conscious.

JS: Even if the apparent citation advantage of OA turns out to be false it does not weaken the real advantages of OA. We should not be drawn into a time and effort wasting defence of it while there is other work to be done to promote OA.

The real advantage of Open Access is Access. The advantage of Access is Usage and Impact (of which citations are one indicator). The Craig et al. study has not shown that the OA Impact Advantage is not real. It has simply pointed out that correlation does not entail causation. Duly noted. I agree that no time or effort should be spent now trying to demonstrate causation. The time and effort should be used to provide OA.

Bernd-Christoph Kaemper (B-CK) wrote on SOAF:

Elsevier said that citation rates of their journals had gone up considerably because of the increased access through wide- spread online availability of their journals...

Online availability clearly increased the IF [journal citation impact factor]. In the FUTON subcategory, there was an IF gradient favoring journals with freely available articles. ..."

I think it is quite obvious why sources available with open access will be used and cited more often than others...

So the usefulness of open access is a matter of daily experience, not so much of academic discussions whether there is any empirical proof for a citation advantage of open access that may be isolated by eliminating all possible confounders...

That open access leads to more visibility and thereby potentially more citations is trivial, but this relative open access advantage will vary from journal to journal...

Due to the multitude of possible confounding factors I would not believe any of the figures calculated by Stevan Harnad as the cumulated lost impact, or conversely, the possible gain.

I couldn't quite follow the logic of this posting. It seemed to be saying that, yes, there is evidence that OA increases impact, it is even trivially obvious, but, no, we cannot estimate how much, because there are possible confounding factors and the size of the increase varies.

All studies have found that the size of the OA impact differential varies from field to field, journal to journal, and year to year. The range of variation is from +25% to over +250% percent. But the differential is always positive, and mostly quite sizeable. That is why I chose a conservative overall estimate of +50% for the potential gain in impact if it were not just the current 15% of research that was being made OA, but also the remaining 85%. (If you think 50% is not conservative enough, use the lower-bound 25%: You'll still find a substantial potential impact gain/loss. If you think self-selection accounts for half the gain, split it in half again: there's still plenty of gain, once you multiply by 85% of total citations.)

An interesting question that has since arisen (and could be answered by similar studies) is this:

Since it is known that (in science) the top 10% of articles published receive 90% of the total citations made (Seglen 1992), to what extent is the top 10% of articles published over-represented among the c. 15% of articles that are being spontaneously made OA by their authors today?

It is a logical possibility that all or most of the top 10% are already among the 15% that are being made OA: I rather doubt it; but it would be worth checking whether it is so. [Attention lobbyists against OA mandates! Get out your scissors here and prepare to snip an out-of-context quote...]

[snip]
If it did turn out that all or most of the top-cited 10% of articles are already among the c.15% of articles that are already being made OA, then reaching 100% OA would be far less urgent and important than I had argued, and OA mandates would likewise be less important. I for one would no longer find it important enough to archivangelize if I knew it was just for the bottom 90% of articles, the top 10% of articles having already been self-archived, spontaneously and sensibly, by their top 10% authors without having to be mandated.
[/snip]

The empirical studies of the relation between OA and impact have been mostly motivated by the objective of accelerating the growth of OA -- and thereby the growth of research usage and impact. Those who are oersuaded that the OA impact differential is merely or largely a non-causal self-selection bias are encouraged to demonstrate that that is the case.

Note very carefully, though, that the observed correlation between OA and citations takes the form of a correlation between the number of OA articles, relative to non-OA articles, at each citation level. The more highly cited an article, the more likely it is OA. This is true within journals, and within and across years, in every field tested.

And this correlation can arise because more-cited articles are more likely to be made OA or because articles that are made OA are more likely to be cited (or both -- which is what I think is in reality the case). It is certainly not the case that self-selection is the default or null hypothesis, and that those who interpret the effect as OA causing the citation increase hence have the burden of proof: The situation is completely symmetric numerically; so your choice between the two hypotheses is not based on the numbers, but on other considerations, such as prima facie plausibility -- or financial interest.

Until and unless it is shown empirically that today's OA 15% already contains all or most of the top-cited 10% (and hence 90% of what researchers cite), I think it is a much more plausible interpretation of the existing findings that OA is a cause of the increased usage and citations, rather than just a side-effect of them, and hence that there is usage and impact to be gained by providing and mandating OA. (I can quite understand why those who have a financial interest in its being otherwise [Craig et al. 2007] might prefer the other interpretation, but clearly prima facie plausibility cannot be their justification.)

I also think that 50% of total citations is a plausible overall estimate of the potential gain from OA, as long as it is understood clearly that that the 50% gain does not apply to every article made OA. Many articles are not found useful enough to cite no matter how accessible you make them. The 50% citation gain will mostly accrue to the top 10% of articles, as citations always do (though OA will no doubt also help to remedy some inequities and will sometimes help some neglected gems to be discovered and used more widely). In other words, the OA advantage to an article will be roughly proportional to that article's intrinsic citation value (independent of OA).

Other interesting questions: The top-cited articles are not evenly distributed among journals. The top journals tend to get the top-cited articles. It is also unlikely that journal subscriptions are evenly distributed among journals: The top journals are likely to be subscribed to more, and are hence more accessible.

So if someone is truly interested in these questions (as I am not!), they might calculate a "toll-accessibility index" (TAI) for each article, based on the number of researchers/institutions that have toll access to the journal in which that article is published. An analysis of covariance can then be done to see whether and how much the OA citation advantage is reduced if one controls for the article's TAI. (I suspect the answer will be: somewhat, but not much.)

B-CK: Could we do a thought experiment? From a representative group of authors, choose a sample of authors randomly and induce them to make their next article open access. Do you believe they will see as much gain in citations compared to their previous average citation levels as predicted from the various current "OA advantage" studies where several confounding factors are operating? Probably not - but what would remain of that advantage? -- I find that difficult to predict or model.

From a random sample, I would expect an increase of around 50% or more in total citations, 90% of the increased citations going to the top 10%, as always.

B-CK: As I learned from your posting, you seem to predict that it will anyway depend on the previous citedness of the members of that group (if we take that as a proxy for the unknown actual intrinsic citation value of those articles), in the sense that more-cited authors will see a larger percentage increase effect.

I don't think it's just a Matthew Effect; I think the highest quality papers get the most citations (90%), and the highest quality papers are apparently about 10% (in science, according to Seglen).

B-CK: To turn your argument around, most authors happily going open access in expectation of increased citation might be disappointed because the 50% increase will only apply to a small minority of them.

That's true; but you could say the same for most authors going into research at all. There is no guarantee that they will produce the highest quality research, but I assume that researchers do what they do in the hope that they will, if not this time, then the next time, produce the highest quality research.

B-CK: That was the reason why I said that (as an individual author) I would rather not believe in any "promised" values for the possible gain.

Where there is life, and effort, there is hope. I think every researcher should do research, and publish, and self-archive, with the ambition of doing the best quality work, and having it rewarded with valuable findings, which will be used and cited.

My "promise", by the way, was never that each individual author would get 50% more citations. (That would actually have been absurd, since over 50% of papers get no citations at all -- apart from self-citation -- and 50% of 0 is still 0.)

My promise, in calculating the impact gain/loss that you doubted, was to countries, research funders and institutions. On the assumption that the research output of each roughly covers the quality spectrum, they can expect their total citations to increase by 50% or more with OA, but that increase will be mostly at their high-quality end. (And the total increase is actually about 85% of 50%, as the baseline spontaneous self-archiving rate is about 15%.)

B-CK: That doesn't mean though that there are not enough other reasons to go for open access (I mentioned many of them in my posting).

There are other reasons, but researchers' main motivation for conducting and publishing research is in order to make a contribution to knowledge that will be found useful by, and used by, and built upon by other researchers. There are pedagogic goals too, but I think they are secondary, and I certainly don't think they are strong enough to induce a researcher to make his publications OA, if the primary reason was not reason enough to induce them.

(Actually, I don't think any of the reasons are enough to induce enough researchers to provide OA, and that's why Green OA mandates are needed -- and being provided -- by researchers' institutions and funders.)

B-CK: With respect to the toll accessibility index, I completely agree. The occasional good article in an otherwise "obscure" journal probably has a lot to gain from open access, as many people would not bother to try to get hold of a copy should they find it among a lot of others in a bibliographic database search, if it doesn't look from the beginning like a "perfect match" of what they are looking for.

You agree with the toll-accessibility argument prematurely: There are as yet no data on it, whereas there are plenty of data on the correlation between OA and impact.

B-CK: An interesting question to look at would also be the effect of open access on non-formal citation modes like web linking, especially social bookmarking. Clearly NPG is interested in Connotea also as a means to enhance the visibility of articles in their own toll access articles. Has anyone already tried such investigations?

Although I cannot say how much it is due to other kinds of links or from citation links themselves, the University of Southampton, the first institution with a (departmental) Green OA self-archiving mandate, and also the one with the longest-standing mandate also has a surprisingly high webmetric, university-metric and G-factor rank:

Stevan Harnad
American Scientist Open Access Forum

Bollen, J., Van de Sompel, H., Smith, J. and Luce, R. (2005) Toward alternative metrics of journal impact: A comparison of download and citation data. Information Processing and Management, 41(6): 1419-1440.

Brody, T., Harnad, S. and Carr, L. (2006) Earlier Web Usage Statistics as Predictors of Later Citation Impact. Journal of the American Association for Information Science and Technology (JASIST) 57(8) pp. 1060-1072.

Craig, Ian; Andrew Plume, Marie McVeigh, James Pringle & Mayur Amin (2007) Do Open Access Articles Have Greater Citation Impact? A critical review of the literature. Journal of Informetrics.

Davis, P. M. and Fromerth, M. J. (2007) Does the arXiv lead to higher citations and reduced publisher downloads for mathematics articles? Scientometrics 71: 203-215.
See critiques: 1 and 2.

Diamond, Jr. , A. M. (1986) What is a Citation Worth? Journal of Human Resources 21:200-15, 1986,

Eysenbach, G. (2006) Citation Advantage of Open Access Articles. PLoS Biology 4: 157.

Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4) pp. 39-47.

Hajjem, C. and Harnad, S. (2006) Manual Evaluation of Robot Performance in Identifying Open Access Articles. Technical Report, Institut des sciences cognitives, Universite du Quebec a Montreal.

Hajjem, C. and Harnad, S. (2006) The Self-Archiving Impact Advantage: Quality Advantage or Quality Bias? Technical Report, ECS, University of Southampton.

Hajjem, C. and Harnad, S. (2007) Citation Advantage For OA Self-Archiving Is Independent of Journal Impact Factor, Article Age, and Number of Co-Authors. Technical Report, Electronics and Computer Science, University of Southampton.

Hajjem, C. and Harnad, S. (2007) The Open Access Citation Advantage: Quality Advantage Or Quality Bias? Technical Report, Electronics and Computer Science, University of Southampton.

Harnad, S. & Brody, T. (2004) Comparing the Impact of Open Access (OA) vs. Non-OA Articles in the Same Journals, D-Lib Magazine 10 (6) June

Harnad, S. (2005) Making the case for web-based self-archiving. Research Money 19(16).

Harnad, S. (2005) Maximising the Return on UK's Public Investment in Research. (Unpublished ms.)

Harnad, S. (2005) OA Impact Advantage = EA + (AA) + (QB) + QA + (CA) + UA. (Unpublished ms.)

Harnad, S. (2005) On Maximizing Journal Article Access, Usage and Impact. Haworth Press (occasional column).

Harnad, S. (2006) Within-Journal Demonstrations of the Open-Access Impact Advantage: PLoS, Pipe-Dreams and Peccadillos (LETTER). PLOS Biology 4(5).

Henneken, E. A., Kurtz, M. J., Eichhorn, G., Accomazzi, A., Grant, C., Thompson, D., and Murray, S. S. (2006) Effect of E-printing on Citation Rates in Astronomy and Physics. Journal of Electronic Publishing, Vol. 9, No. 2, Summer 2006

Henneken, E. A., Kurtz, M. J., Warner, S., Ginsparg, P., Eichhorn, G., Accomazzi, A., Grant, C. S., Thompson, D., Bohlen, E. and Murray, S. S. (2006) E-prints and Journal Articles in Astronomy: a Productive Co-existence Learned Publishing.

Kurtz, M. J., Eichhorn, G., Accomazzi, A., Grant, C. S., Demleitner, M., Murray, S. S. (2005) The Effect of Use and Access on Citations. Information Processing and Management, 41 (6): 1395-1402.

Kurtz, Michael and Brody, Tim (2006) The impact loss to authors and research. In, Jacobs, Neil (ed.) Open Access: Key strategic, technical and economic aspects. Oxford, UK, Chandos Publishing.

Lawrence, S, (2001) Online or Invisible?, Nature 411 (2001) (6837): 521.

Metcalfe, Travis S (2006) The Citation Impact of Digital Preprint Archives for Solar Physics Papers. Solar Physics 239: 549-553

Moed, H. F. (2006) The effect of 'Open Access' upon citation impact: An analysis of ArXiv's Condensed Matter Section (preprint)

Perneger, T. V. (2004) Relation between online 'hit counts' and subsequent citations: prospective study of research papers in the British Medical Journal. British Medical Journal 329:546-547.

Seglen, P.O. (1992) The skewness of science. The American Society for Information Science 43: 628-638

Posted by Stevan Harnad in Methodology at 22:36 | Comments (0) | Trackbacks (0)

Thursday, March 15. 2007

Don't Count Your (Golden) Chickens Before Your (Green) Eggs Are Laid

SUMMARY: Michael Kurtz writes:
   Research is growing. (True, but that is independent of OA.)
   Publication costs are not a large percentage of total research costs. (True, but publication costs are already being paid in full, today, by subscription fees; any new publication charges today hence mean additional funds redirected from research -- or from elsewhere -- to double-pay, unless the existing subscription fees are redirected toward paying the publication charges.)
  OA enhances research progress. (True)
  Author publication charges are not new (in some fields). (True, but new OA publication charges, today, would be new, and additional, unless their payment was redirected from subscription savings.)
   Mandated Green OA could cause subscription collapse. (Possibly, but if it did, that would simultaneously release the subscription savings to be redirected to pay for Gold OA publication charges [probably reduced to just the cost of peer review]; and meanwhile we would already have 100% [Green] OA, either way.)
  In some fields, Central Repositories (CRs) like Arxiv have a larger Green OA percentage of total research output than Institutional Repositories (IRs). (A few fields do provide Green OA, unmandated, today, but most don't; that's why we need mandates; IRs that mandate Green reach 100% OA within about two years; institutions and funders mandate; "fields" do not; institutions wish to record, showcase, and maximize the impact of their own research output; "fields" do not; CRs can harvest from IRs; the locus of deposit for mandates should be researchers' own IRs.)

I would dearly love to adhere to my dictum "Hypotheses non Fingo," but with hypotheses being finged willy-nilly by others -- at the cost of neglecting or even discouraging tried-and-tested practical (and a-theoretical) action (i.e., Green OA mandates) -- I am left with little choice but to resort to counter-hypothesizing:

On Wed, 14 Mar 2007, Michael Kurtz [MK] wrote in the American Scientist Open Access Forum:

MK: "(A) THE CURRENT SITUATION. The quantity of scientific research has been increasing exponentially for several generations. This increase, roughly an order of magnitude during my lifetime (~4% per year, essentially the same as the growth in the global economy), has been mediated and enabled by the existing system for scientific communication, namely toll access journals and libraries."

Correct.

And another thing has happened in the past generation or so: The birth of the Net and Web, making it possible to supplement toll-access with author-provided free online access (Green OA).

That development has next to nothing to do with the growth in the number of articles, nor with the price of journals. It has to do with the possibility of supplementing toll access with free online access.

MK: "(B) THE CURRENT COSTS. Direct costs for journals are remarkably small, about 1% of the total research and development budget (1). This compares with other costs involved such as (2) unpaid refereeing and editing 1% and the non-acquisition costs of a library, 2%. Possible changes to the direct cost of journals, up or down, are likely to be smaller than the error in estimating the yearly inflation adjustments."

Correct, but irrelevant to the question of providing free online access for would-be users who cannot afford toll access.

Yes, if the money currently being spent on user-institution access-tolls were instead redirected to pay for author-institution publication charges, no more or less money would be spent, and online access would be free (Gold OA). But that is happening far too slowly, and does not depend only on the researcher community. Supplementing toll access with free online access (Green OA) is entirely in the hands of the research community.

Providing supplementary online access for free can be accelerated to 100% within a year or two through the adoption of research funder and university Green OA self-archiving mandates. That too is in the hands of the research community. Until it is done, research usage and impact continues to be lost, needlessly, daily.

MK: "(C) THE POSSIBLE BENEFIT OF OPEN ACCESS. The purpose of OA is to increase the amount and quality of research. The growth rate of research is currently ~4%; if OA is a massive success, it could perhaps increase this growth rate by 10%, which would be a yearly increment of 0.4% of total research. It may be expected that the greatest effect of OA would be in cross-disciplinary research, such as Nanotechnology."

(The quantitative estimates are still rather speculative. [Here are some more.] But let us agree that providing OA will indeed increase research productivity and progress.)

MK: "(D) THE RISK OF OPEN ACCESS. By substantially changing the economics of journal publishing OA risks the catastrophic financial collapse of some publishers. This is especially true for the mandated 100% green OA path."

If and when mandated 100% Green OA does cause subscriptions to be cancelled to unsustainable levels, the resultant user-institution subscription savings can be redirected to pay instead for author-institution publication charges (Gold OA).

Green OA mandates, by research institutions and funders are possible (indeed actual), and can grow institution by institution and funder by funder.

If Gold OA (with its attendant redirection of subscription funds) can be mandated at all, it certainly cannot be done institution by institution and funder by funder (with 24,000 journals, 10,000 institutions, and hundreds of public funders worldwide). Redirection, if it is to occur at all, has to be driven by Green OA mandates.

Pre-emptive redirection of funds (by an institution or a funder) toward Gold OA, without being preceded by 100% Green OA, is a waste of money, effort and time, today. (After 100% Green OA it is fine, as long as there is no double-paying, through redirection of research money instead of subscription money.)

MK: "(E) CURRENT GREEN MODELS. There are basically two types of Green repository: centralized, such as arXiv, and distributed, as the institutional repositories. Only arXiv has much of a track record. After more than 15 years arXiv only has more than half the refereed articles in the two subfields of High Energy Physics and Astrophysics; only HEP has more than 90%. It does not appear that there is any subfield of science where the existing institutional repositories contain more than half of the refereed literature."

It is completely irrelevant where the free online articles are located. (The IRs and CRs are all OAI-interoperable.) What matters is that 100% of articles should be free online. Spontaneous central archiving has not reached 100% in 15 years (where it is being done at all). The natural and optimal place for institutions to mandate the deposit of their own article output is in their own IRs. That covers all of research output space. Mandated IRs fill within two years. Research funder mandates should reinforce the institutional mandates. If CRs are desired, they can harvest from the IRs.

MK: "(F) CURRENT GOLD MODELS. Page charges have existed for decades as a method of financing journals; while their use has been in decline for some time several venerable titles use them, in whole or in part, and there are several new, page charge funded, OA journals. Direct subsidies, by scholarly organizations and funding agencies, have long been used to support scientific publishing. Nearly all technical reports series are funded in this manner."

Publication charges are currently being fully covered by subscriptions, but access is not open to all would-be users, hence research usage and impact (productivity and progress) are being needlessly lost.

There is no realistic way (nor is there a will) to redirect the subscription money currently being spent by 10,000 user-institutions worldwide for various subsets of 24,000 journals toward instead paying author-institution Gold OA publication charges. Hence the only money that can be redirected to pay for Gold OA today (by institutions or funders) is money that is currently being spent on research or other expenses, thereby effectively double-paying for publication (and at a time when subscription costs are already inflated).

Hence if the goal is 100% OA, the way to reach it is through institutions and funders mandating Green OA.

After that, redirect toward Gold OA to your heart's content. But to do so before that, or instead of that, is pure folly.

P.S. The journal affordability problem and the research accessibility problem are not the same problem. Green OA mandates will solve the research accessibility problem for sure. They may or may not cause unsustainable cancellations, but either way they will ease, though not solve, the journal affordability problem (by making the decision about which journal subscriptions to purchase from a limited serials budget into less of a life-or-death question, given that 100% Green OA is there as a safety net for accessing whatever an institutions cannot afford). Green OA, if it causes cancellations, will also cause cost-cutting and downsizing (the IRs can take over the access-provision and archiving load, leaving the journals with peer-review management as their only service), making (post-Green) Gold OA more affordable than it would be today (pre-Green).

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Self-Archiving Mandates at 11:21 | Comments (0) | Trackbacks (0)

Sunday, January 21. 2007

The Open Access Citation Advantage: Quality Advantage Or Quality Bias?

Update Jan 1, 2010: See Gargouri, Y; C Hajjem, V Larivière, Y Gingras, L Carr,T Brody & S Harnad (2010) “Open Access, Whether Self-Selected or Mandated, Increases Citation Impact, Especially for Higher Quality Research”
Update Feb 8, 2010: See also "Open Access: Self-Selected, Mandated & Random; Answers & Questions"

SUMMARY: Many studies have now reported the positive correlation between Open Access (OA) self-archiving and citation counts ("OA Advantage," OAA). But does this OAA occur because articles that are self-archived are more likely to be cited ("Quality Advantage": QA) or because articles that are more likely to be cited are more likely to be self-archived ("Quality Bias," QB)? The probable answer is both. Three studies [by Kurtz and co-workers in astrophysics, Moed in condensed matter physics, and Davis & Fromerth in mathematics] had attributed the OAA to QB [and to EA, the Early Advantage of self-archiving the preprint before publication] rather than QA. These three fields, however, happen to be among the minority of fields that (1) make heavy use of prepublication preprints and (2) have less of a postprint access problem than most other fields. Chawki Hajjem has now analyzed preliminary evidence based on over 100,000 articles from multiple fields, comparing self-selected self-archiving with mandated self-archiving to estimate the contributions of QB and QA to the OAA. Both factors contribute, and the contribution of QA is greater.

This is a preview of some preliminary data (not yet refereed), collected by my doctoral student at UQaM, Chawki Hajjem. This study was done in part by way of response to Henk Moed's replies to my comments on Moed's (self-archived) preprint:

Moed, H. F. (2006) The effect of 'Open Access' upon citation impact: An analysis of ArXiv's Condensed Matter Section

Moed's study is about the "Open Access Advantage" (OAA) -- the higher citation counts of self-archived articles -- observable across disciplines as well as across years as in the following graphs from Hajjem et al. 2005 (red bars are the OAA):

FIGURE 1. Open Access Citation Advantage By Discipline and By Year.
Green bars are percentage of articles self-archived (%OA); red bars, percentage citation advantage (%OAA) for self-archived articles for 10 disciplines (upper chart) across 12 years (lower chart, 1992-2003). Gray curve indicates total articles by discipline and year.
Source: Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4) pp. 39-47.

The focus of the present discussion is the factors underlying the OAA. There are at least five potential contributing factors, but only three of them are under consideration here: (1) Early Advantage (EA), (2) Quality Advantage (QA) and (3) Quality Bias (QB -- also called "Self-Selection Bias").

Preprints that are self-archived before publication have an Early Advantage (EA): they get read, used and cited earlier. This is uncontested.

Kurtz, Michael and Brody, Tim (2006) The impact loss to authors and research. In, Jacobs, Neil (ed.) Open Access: Key strategic, technical and economic aspects. Oxford, UK, Chandos Publishing.

In addition, the proportion of articles self-archived at or after publication is higher in the higher "citation brackets": the more highly cited articles are also more likely to be the self-archived articles.

FIGURE 2. Correlation between Citedness and Ratio of Open Access (OA) to Non-Open Access (NOA) Ratios.
The (OAc/TotalOAc)/(NOAc/TotalNOAc) ratio (across all disciplines and years) increases as citation count (c) increases (r = .98, N=6, p<.005). The more cited an article, the more likely that it is OA. (Hajjem et al. 2005)

The question, then, is about causality: Are self-archived articles more likely to be cited because they are self-archived (QA)? Or are articles more likely to be self-archived because they are more likely to be cited (QB)?

The most likely answer is that both factors, QA and QB, contribute to the OAA: the higher quality papers gain more from being made more accessible (QA: indeed the top 10% of articles tend to get 90% of the citations). But the higher quality papers are also more likely to be self-archived (QB).

As we will see, however, the evidence to date, because it has been based exclusively on self-selected (voluntary) self-archiving, is equally compatible with (i) an exclusive QA interpretation, (ii) an exclusive QB interpretation or (iii) the joint explanation that is probably the correct one.

The only way to estimate the independent contributions of QA and QB is to compare the OAA for self-selected (voluntary) self-archiving with the OAA for imposed (obligatory) self-archiving. We report some preliminary results for this comparison here, based on the (still small sample of) Institutional Repositories that already have self-archiving mandates (chiefly CERN, U. Southampton, QUT, U. Minho, and U. Tasmania).

FIGURE 3. Self-Selected Self-Archiving vs. Mandated Self-Archiving: Within-Journal Citation Ratios (for 2004, all fields).
S = citation counts for articles self-archived at institutions with (Sm) and without (Sn) a self-archiving mandate. N = citation counts for non-archived articles at institutions with (Nm) and without (Nn) mandate (i.e., Nm = articles not yet compliant with mandate). Grand average of (log) S/N ratios (106,203 articles; 279 journals) is the OA advantage (18%); this is about the same as for Sn/Nn (27972 articles, 48 journals, 18%) and Sn/N (17%); ratio is higher for Sm/N (34%), higher still for Sm/Nm (57%, 541 articles, 20 journals); and Sm/Sn = 27%, so self-selected self-archiving does not yield more citations than mandated; rather the reverse. (All six within-pair differences are significant: correlated sample t-tests.) (NB: preliminary, unrefereed results.)

Summary: These preliminary results suggest that both QA and QB contribute to OAA, and that the contribution of QA is greater than that of QB.

Discussion: On Fri, 8 Dec 2006, Henk Moed [HM] wrote:

HM: "Below follow some replies to your comments on my preprint 'The effect of 'Open Access' upon citation impact: An analysis of ArXiv's Condensed Matter Section'...

"1. Early view effect. [EA] In my case study on 6 journals in the field of condensed matter physics, I concluded that the observed differences between the citation age distributions of deposited and non-deposited ArXiv papers can to a large extent - though not fully - be explained by the publication delay of about six months of non-deposited articles compared to papers deposited in ArXiv. This outcome provides evidence for an early view [EA] effect upon citation impact rates, and consequently upon ArXiv citation impact differentials (CID, my term) or Arxiv Advantage (AA, your term)."
SH: "The basic question is this: Once the AA (Arxiv Advantage) has been adjusted for the "head-start" component of the EA (by comparing articles of equal age -- the age of Arxived articles being based on the date of deposit of the preprint rather than the date of publication of the postprint), how big is that adjusted AA, at each article age? For that is the AA without any head-start. Kurtz never thought the EA component was merely a head start, however, for the AA persists and keeps growing, and is present in cumulative citation counts for articles at every age since Arxiving began".
HM: "Figure 2 in the interesting paper by Kurtz et al. (IPM, v. 41, p. 1395-1402, 2005) does indeed show an increase in the very short term average citation impact (my terminology; citations were counted during the first 5 months after publication date) of papers as a function of their publication date as from 1996. My interpretation of this figure is that it clearly shows that the principal component of the early view effect is the head-start: it reveals that the share of astronomy papers deposited in ArXiv (and other preprint servers) increased over time. More and more papers became available at the date of their submission to a journal, rather than on their formal publication date. I therefore conclude that their findings for astronomy are fully consistent with my outcomes for journals in the field of condensed matter physics."

The findings are definitely consistent for Astronomy and for Condensed Matter Physics. In both cases, most of the observed OAA came from the self-archiving of preprints before publication (EA).

Moreover, in Astronomy there is already 100% "OA" to all articles after publication, and this has been the case for years now (for the reasons Michael Kurtz and Peter Boyce have pointed out: all research-active astronomers have licensed access as well as free ADS access to all of the closed circle of core Astronomy journals: otherwise they simply cannot be research-active). This means that there is only room for EA in Astronomy's OAA. And that means that in Astronomy all the questions about QA vs QB (self-selection bias) apply only to the self-archiving of prepublication preprints, not to postpublication postprints, which are all effectively "OA."

To a lesser extent, something similar is true in Condensed-Matter Physics (CondMP): In general, research-active physicists have better access to their required journals via online licensing than other fields do (though one does wonder about the "non-research-active" physicists, and what they could/would do if they too had OA!). And CondMP too is a preprint self-archiving field, with most of the OAA differential again concentrated on the prepublication preprints (EA). Moreover, Moed's test for whether or not a paper was self-archived was based entirely on its presence/absence in ArXiv (as opposed to elsewhere on the Web, e.g., on the author's website or in the author's Institutional Repository).

Hence Astronomy and CondMP are fields that are "biassed" toward EA effects. It is not surprising, therefore, that the lion's share of the OAA turns out to be EA in these fields. It also means that the remaining variance available for testing QA vs. QB in these fields is much narrower than in fields that do not self-archive preprints only, or mostly.

Hence there is no disagreement (or surprise) about the fact that most of the OAA in Astronomy and CondMP is due to EA. (Less so in the slower-moving field of maths; see: "Early Citation Advantage?.")

SH: "The fact that highly-cited articles (Kurtz) and articles by highly-cited authors (Moed) are more likely to be Arxived certainly does not settle the question of cause and effect: It is just as likely that better articles benefit more from Arxiving (QA) as that better authors/articles tend to Arxive/be-Arxived more (QB)."
HM: "2. Quality bias. I am fully aware that in this research context one cannot assess whether authors publish [sic] their better papers in the ArXiv merely on the basis of comparing citation rates of archived and non-archived papers, and I mention this in my paper. Citation rates may be influenced both by the 'quality' of the papers and by the access modality (deposited versus non-deposited). This is why I estimated author prominence on the basis of the citation impact of their non-archived articles only. But even then I found evidence that prominent, influential authors (in the above sense) are overrepresented in papers deposited in ArXiv."

I agree with all this: The probable quality of the article was estimated from the probable quality of the author, based on citations for non-OA articles. Now, although this correlation, too, goes both ways (are authors' non-OA articles more cited because their authors self-archive more or do they self-archive more because they are more cited?), I do agree that the correlation between self-archiving-counts and citation-counts for non-self-archived articles by the same author is more likely to be a QB effect. The question then, of course, is: What proportion of the OAA does this component account for?

HM: "But I did more that that. I calculated Arxiv Citation Impact Differentials (CID, my term, or ArXiv Advantage, AA, your term) at the level of individual authors. Next, I calculated the median CID over authors publishing in a journal. How then do you explain my empirical finding that for some authors the citation impact differential (CID) or ArXiv Advantage is positive, for others it is negative, while the median CID over authors does not significantly differ from zero (according to a Sign test) for all journals studied in detail except Physical Review B, for which it is only 5 per cent? If there is a genuine 'OA advantage' at stake, why then does it for instance not lead to a significantly positive median CID over authors? Therefore, my conclusion is that, controlling for quality bias and early view effect, in the sample of 6 journals analysed in detail in my study, there is no sign of a general 'open access advantage' of papers deposited in ArXiv's Condensed Matter Section."

My interpretation is that EA is the largest contributor to the OAA in this preprint-intensive field (i.e., most of the OAA comes from the prepublication component) and that there is considerable variability in the size of the (small) residual (non-EA) OAA. For a small sample, at the individual journal level, there is not enough variance left for a significant OAA, once one removes the QB component too. Perhaps this is all that Henk Moed wished to imply. But the bigger question for OA concerns all fields, not just those few that are preprint-intensive and that are relatively well-heeled for access to the published version. Indeed, the fundamental OA and OAA questions concern the postprint (not the preprint) and the many disciplines that do have access problems, not the happy few that do not!

The way to test the presence and size of both QB and QA in these non-EA fields is to impose the OA, preferably randomly, on half the sample, and then compare the size of the OAA for imposed ("mandated") self-archiving (Sm) with the size of the OAA for self-selected ("nonmandated") self-archiving (Sn), in particular by comparing their respective ratios to non-self-archived articles in the same journal and year: Sm/N vs. Sn/N).

If Sn/N > Sm/N then QB > QA, and vice versa. If Sn/N = 1, then QB is 0. And if Sm/N = 1 then QA is 0.

It is a first approximation to this comparison that has just been done (FIGURE 3) by my doctoral student, Chawki Hajjem, across fields, for self-archived articles in five Institutional Repositories (IRs) that have OA self-archiving mandates, for 106,203 articles published in 276 biomedical journal 2004, above.

The mandates are still very young and few, hence the sample is still small; and there are many potential artifacts, including selective noncompliance with the mandate as well as disciplinary bias. But the preliminary results so far suggest that (1) QA is indeed > 0, and (2) QA > QB.

[I am sure that we will now have a second round from die-hards who will want to argue for a selective-compliance effect, as a 2nd-order last gasp for the QB-only hypothesis, but of course that loses all credibility as IRs approach 100% compliance: We are analyzing our mandated IRs separately now, to see whether we can detect any trends correlated with an IR's %OA. But (except for the die-hards, who will never die), I think even this early sample already shows that the OA advantage is unlikely to be only or mostly a QB effect.]

HM: "3. Productive versus less productive authors. My analysis of differences in Citation Impact differentials between productive and less productive authors may seem "a little complicated". My point is that if one selects from a set of papers deposited in ArXiv a paper authored by a junior (or less productive) scientist, the probability that this paper is co-authored by a senior (or more productive) author is higher than it is for a paper authored by a junior scientist but not deposited in ArXiv. Next, I found that papers co-authored by both productive and less productive authors tend to have a higher citation impact than articles authored solely by less productive authors, regardless of whether these papers were deposited in ArXiv or not. These outcomes lead me to the conclusion that the observed higher CID for less productive authors compared to that of productive authors can be interpreted as a quality bias."

It still sounds a bit complicated, but I think what you mean is that (1) mixed multi-author papers (ML, with M = More productive authors, L = less productive authors) are more likely to be cited than unmixed multi-author (LL) papers with the same number of authors, and that (2) such ML papers are also more likely to be self-archived. (Presumably MM papers are the most cited and most self-archived of multi-author papers.)

That still sounds to me like a variant on the citation/self-archiving correlation, and hence intepretable as either QA or QB or both. (Chawki Hajjem has also found that citation counts are positively correlated with the number of authors an article has: this could either be a self-citation bias or evidence that multi-authored paper tend to be better ones.)

HM: "4. General comments. In the citation analysis by Kurtz et al. (2005), both the citation and target universe contain a set of 7 core journals in astronomy. They explain their finding of no apparent OA effect in his study of these journals by postulating that "essentially all astronomers have access to the core journals through existing channels". In my study the target set consists of a limited number of core journals in condensed matter physics, but the citation universe is as large as the total Web of Science database, including also a number of more peripherical journals in the field. Therefore, my result is stronger than that obtained by Kurtz at al.: even in this much wider citation universe, I do not find evidence for an OA advantage effect."

I agree that CondMP is less preprint-intensive, less accessible and less endogamous than Astrophysics, but it is still a good deal more preprint-intensive and accessible than most fields (and I don't yet know what role the exogamy/enodgamy factor plays in either citations or the OAA: it will be interesting to study, among many other candidate metrics, once the entire literature is OA).

HM: "I realize that my study is a case study, examining in detail 6 journals in one subfield. I fully agree with your warning that one should be cautious in generalizing conclusions from case studies, and that results for other fields may be different. But it is certainly not an unimportant case. It relates to a subfield in physics, a discipline that your pioneering and stimulating work (Harnad and Brody, D-Lib Mag., June 2004) has analysed as well at a more aggregate level. I hope that more case studies will be carried out in the near future, applying the methodologies I proposed in my paper."

Your case study is very timely and useful. However, robot-based studies based on much larger samples of journals and articles have now confirmed the OAA in many more fields, most of them not preprint-based at all, and with access problems more severe than those of physics.

Conclusions

I would like to conclude with a summary of the "QB vs. QA" evidence to date, as I understand it:

(1) Many studies have reported the OA Advantage, across many fields.

(2) Three studies have reported QB in preprint-intensive fields that have either no postprint access problem or markedly less than other fields (astrophysics, condensed matter, mathematics).

(3) The author of one of these three studies is pro-OA (Kurtz, who is also the one who drew my attention to the QA counterevidence); the author of the second is neutral (Moed); and the author of the third might (I think -- I'm not sure) be mildly anti-OA (Davis -- now collaborating with a publisher to do a 4-year [sic!] long-term study on QA vs QB).
Henneken, E. A., Kurtz, M. J., Eichhorn, G., Accomazzi, A., Grant, C., Thompson, D., and Murray, S. S. (2006) Effect of E-printing on Citation Rates in Astronomy and Physics. Journal of Electronic Publishing, Vol. 9, No. 2, Summer 2006

Moed, H. F. (2006, preprint) The effect of 'Open Access' upon citation impact: An analysis of ArXiv's Condensed Matter Section

Davis, P. M. and Fromerth, M. J. (2007) Does the arXiv lead to higher citations and reduced publisher downloads for mathematics articles? Scientometics, accepted for publication. See critiques: 1, 2
(4) So the overall research motivation for testing QB is not an anti-OA motivation.

(5) On the other hand, the motivation on the part of some publishers to put a strong self-serving spin on these three QB findings is of course very anti-OA and especially, now, anti-OA-self-archiving-mandate. (That's quite understandable, and no problem at all.)

(6) In contrast to the three studies that have reported what they interpret as evidence of QB (Kurtz in astro, Moed in cond-mat and Davis in maths), there are the many other studies that report large OA citation (and download) advantages, across a large number of fields. Those who have interests that conflict with OA and OA self-archiving mandates are ignoring or discounting this large body of studies, and instead just spinning the three QB reports as their justification for ignoring the larger body of findings.

This will all be resolved soon, and the outcome of our QA vs. QB comparison for mandated vs. self-selected self-archiving already heralds this resolution. I am pretty confident that the empirical facts will turn out to have been the following: Yes, there is a QB component in the OA advantage (especially in the preprinting fields, such as astro, cond-mat and maths). But that QB component is neither the sole factor nor the largest factor in the OA advantage, particularly in the non-preprint fields with access problems -- and those fields constitute the vast majority. That will be the outcome that is demonstrated, and eventually not only the friends of OA but the foes of OA will have no choice but to acknowledge the new reality of OA, its benefits to research and researchers, and its immediate reachability through the prompt universal adoption of OA self-archiving mandates.

Stevan Harnad & Chawki Hajjem
American Scientist Open Access Forum

Posted by Stevan Harnad in Methodology at 10:18 | Comments (0) | Trackbacks (0)

Monday, November 20. 2006

The Self-Archiving Impact Advantage: Quality Advantage or Quality Bias?

Update Jan 1, 2010: See Gargouri, Y; C Hajjem, V Larivière, Y Gingras, L Carr,T Brody & S Harnad (2010) “Open Access, Whether Self-Selected or Mandated, Increases Citation Impact, Especially for Higher Quality Research”
Update Feb 8, 2010: See also "Open Access: Self-Selected, Mandated & Random; Answers & Questions"

SUMMARY: In astrophysics, Kurtz found that articles that were self-archived by their authors in Arxiv were downloaded and cited twice as much as those that were not. He traced this enhanced citation impact to two factors: (1) Early Access (EA): The self-archived preprint was accessible earlier than the publisher's version (which is accessible to all research-active astrophysicists as soon as it is published, thanks to Kurtz's ADS system). (Hajjem, however, found that in other fields, which self-archive only published postprints and do have accessibility/affordability problems with the publisher's version, self-archived articles still have enhanced citation impact.) Kurtz's second factor was: (2) Quality Bias (QB), a selective tendency for higher quality articles to be preferentially self-archived by their authors, as inferred from the fact that the proportion of self-archived articles turns out to be higher among the more highly cited articles. (The very same finding is of course equally interpretable as (3) Quality Advantage (QA), a tendency for higher quality articles to benefit more than lower quality articles from being self-archived.) In condensed-matter physics, Moed has confirmed that the impact advantage occurs early (within 1-3 years of publication). After article-age is adjusted to reflect the date of deposit rather than the date of publication, the enhanced impact of self-archived articles is again interpretable as QB, with articles by more highly cited authors (based only on their non-archived articles) tending to be self-archived more. (But since the citation counts for authors and for their articles are correlated, one would expect much the same outcome from QA too.) The only way to test QA vs. QB is to compare the impact of self-selected self-archiving with mandated self-archiving (and no self-archiving). (The outcome is likely to be that both QA and QB contribute, along with EA, to the impact advantage.)

Michael Kurtz's papers have confirmed that in astronomy/astrophysics (astro), articles that have been self-archived -- let's call this "Arxived" to mark it as the special case of depositing in the central Physics Arxiv -- are cited (and downloaded) twice as much as non-Arxived articles. Let's call this the "Arxiv Advantage" (AA).

Henneken, E. A., Kurtz, M. J., Eichhorn, G., Accomazzi, A., Grant, C., Thompson, D., and Murray, S. S. (2006) Effect of E-printing on Citation Rates in Astronomy and Physics. Journal of Electronic Publishing, Vol. 9, No. 2, Summer 2006

Henneken, E. A., Kurtz, M. J., Warner, S., Ginsparg, P., Eichhorn, G., Accomazzi, A., Grant, C. S., Thompson, D., Bohlen, E. and Murray, S. S. (2006) E-prints and Journal Articles in Astronomy: a Productive Co-existence (submitted to Learned Publishing)

Kurtz, M. J., Eichhorn, G., Accomazzi, A., Grant, C. S., Demleitner, M., Murray, S. S. (2005) The Effect of Use and Access on Citations. Information Processing and Management, 41 (6): 1395-1402, December 2005

Kurtz analyzed AA and found that it consisted of at least 2 components:

(1) EARLY ACCESS (EA): There is no detectable AA for old articles in astro: AA occurs while an article is young (1-3 years). Hence astro articles that were made accessible as preprints before publication show more AA: This is the Early Access effect (EA). But EA alone does not explain why AA effects (i.e., enhanced citation counts) persist cumulatively and even keep growing, rather than simply being a phase-advancing of otherwise unenhanced citation counts, in which case simply re-calculating an article's age so as to begin at preprint deposit time instead of publication time should eliminate all AA effects -- which it does not.

(2) QUALITY BIAS (QB): (Kurtz called the second component "Self-Selection Bias" for quality, but I call it self-selection Quality Bias, QB): If we compare articles within roughly the same citation/quality bracket (i.e., articles having the same number of citations), the proportion of Arxived articles becomes higher in the higher citation brackets, especially the top 200 papers. Kurtz interprets this is as resulting from authors preferentially Arxiving their higher-quality preprints (Quality Bias).

Of course the very same outcome is just as readily interpretable as resulting from Quality Advantage (QA) (rather than Quality Bias (QB)): i.e., that the Arxiving benefits better papers more. (Making a low-quality paper more accessible by Arxiving it does not guarantee more citations, whereas making a high-quality paper more accessible is more likely to do so, perhaps roughly in proportion to its higher quality, allowing it to be used and cited more according to its merit, unconstrained by its accessibility/affordability.)

There is no way, on the basis of existing data, to decide between QA and QB. The only way to measure their relative contributions would be to control the self-selection factor: randomly imposing Arxiving on half of an equivalent sample of articles of the same age (from preprinting age to 2-3 years postpublication, reckoning age from deposit date, to control also for age/EA effects), and comparing also with self-selected Arxiving.

We are trying an approximation to this method, using articles deposited in Institutional Repositories of institutions that mandate self-archiving (and comparing their citation counts with those of articles from the same journal/issue that have not been self-archived), but the sample is still small and possibly unrepresentative, with many gaps and other potential liabilities. So a reliable estimate of the relative size of QA and QB still awaits future research, when self-archiving mandates will have become more widely adopted.

Henk Moed's data on Arxiving in Condensed Matter physics (cond-mat) replicates Kurtz's findings in astro (and Davis/Fromerth's, in math):

Moed, H. F. (2006, preprint) The effect of 'Open Access' upon citation impact: An analysis of ArXiv's Condensed Matter Section

Davis, P. M. and Fromerth, M. J. (2007) Does the arXiv lead to higher citations and reduced publisher downloads for mathematics articles? Scientometics, accepted for publication. See critiques: 1, 2.

Moed too has shown that in cond-mat the AA effect (which he calls CID "Citation Impact Differential") occurs early (1-3 years) rather than late (4-6 years), and that there is more Arxiving by authors of higher-quality (based on higher citation counts for their non-Arxived articles) than by lower-quality authors. But this too is just as readily interpretable as the result of QB or QA (or both): We would of course expect a high correlation between an author's individual articles' citation counts and the author's average citation count, whether the author's citation count is based on Arxived or non-Arxived articles. These are not independent variables.

(Less easily interpretable -- but compatible with either QA or QB interpretations -- is Moed's finding of a smaller AA for the "more productive" authors. Moed's explanations in terms of co-authorships between more productive and less productive authors, senior and junior, seem a little complicated.)

The basic question is this: Once the AA has been adjusted for the "head-start" component of the EA (by comparing articles of equal age -- the age of Arxived articles being based on the date of deposit of the preprint rather than the date of publication of the postprint), how big is that adjusted AA, at each article age? For that is the AA without any head-start. Kurtz never thought the EA component was merely a head start, however, for the AA persists and keeps growing, and is present in cumulative citation counts for articles at every age since Arxiving began. This non-EA AA is either QB or QA or both. (It also has an element of Competitive Advantage, CA, which would disappear once everything was self-archived, but let's ignore that for now.)

Harnad, S. (2005) OA Impact Advantage = EA + (AA) + (QB) + QA + (CA) + UA. Preprint.

Moed's analysis, like Kurtz's, cannot decide between QB and QA. The fact that most of the AA comes in an article's first 3 years rather than its second 3 years simply shows that both astro and cond-mat are fast-developing fields. The fact that highly-cited articles (Kurtz) and articles by highly-cited authors (Moed) are more likely to be Arxived certainly does not settle the question of cause and effect: It is just as likely that better articles benefit more from Arxiving (QA) as that better authors/articles tend to Arxive/be-Arxived more (QB).

Nor is Arxiv the only test of the self-archiving Open Access Advantage. (Let's call this OAA, generalizing from the mere Arxiving Advantage, AA): We have found an OAA with much the same profile as the AA in 10 further fields, for articles of all ages (from 1 year old to 10 years old), and as far as we know, with the exception of Economics, these are not fields with a preprinting culture (i.e., they don't self-archive preprublication preprints but only postpublication postprints). Hence the consistent pattern of OAA across all fields and across articles of all ages is very unlikely to have been just a head-start (EA) effect.

Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4) pp. 39-47.

Is the OAA, then, QB or QA (or both)? There is no way to determine this unless the causality is controlled by randomly imposing the self-archiving on a subset of a sufficiently large and representative random sample of articles of all ages (but especially newborn ones) and comparing the effect across time.

In the meantime, here are some factors worth taking into account:

(1) Both astro and and cond-mat are fields where it has been repeatedly claimed that the accessibility/affordability problem for published postprints is either nonexistent (astro) or less pronounced than in other fields. Hence the only scope for an OAA in astro and cond-mat is at the prepublication preprint stage.

(2) In many other fields, however, not only is there no prepublication preprint self-archiving at all, but there is a much larger accessibility/affordability barrier for potential users of the published article. Hence there is far more scope for OAA and especially QA (and CA): Access is a necessary (though not a sufficient) causal precondition for impact (usage and citation).

It is hence a mistake to overgeneralize the phys/math AA findings to OAA in general. We need to wait till we have actual data before we can draw confident conclusions about the degree to which the AA or the OAA are a result of QB or QA or both (and/or other factors, such as CA).

For the time being, I find the hypothesis of a causal QA (plus CA) effect, successfully sought by authors because they are desirous of reaching more users, far more plausible and likely than the hypothesis of an a-causal QB effect in which the best authors are self-archiving merely out of superstition or vanity! (And I suspect the truth is a combination of both QA/CA and QB.)

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Methodology at 17:47 | Comments (0) | Trackbacks (0)

Monday, November 13. 2006

Self-Archiving and Journal Subscriptions: Critique of PRC Study

SUMMARY: There is no evidence to date that Open Access (OA) self-archiving causes journal cancellations. The Publishing Research Consortium commissioned a survey of acquisitions librarian preferences to see whether they could predict such cancellations in the future using a "Share of Preference model," but the study has a glaring methodological flaw that invalidates its conclusion (that self-archiving will cause cancellations). The study consisted of asking librarians which of three hypothetical products -- A, B or C -- they preferred least and most, for a variety of hypothetical combinations of 6 properties with 3-4 possible values each:
      1. ACCESS DELAY: 24-months, 12-months, 6-months, immediate access
      2. PERCENTAGE OF JOURNAL'S CONTENT: 100%, 80%, 60%, 40%
      3. COST: 100%, 50%, 25%, 0%
      4. VERSION: preprint, refereed, refereed+copy-edited, published-PDF;
      5. ACCESS RELIABILITY: high, medium, low
      6. JOURNAL QUALITY: high, medium, low
No mention was made of OA self-archiving (in order to avoid "bias"); but, as a result, the model cannot make any prediction at all about the effects of self-archiving on cancellations. The questions on which it is based were about relative preferences for acquisition among competing "products" having different combinations of properties, and the model treated OA (0% cost) as if it were just one of those product properties. But self-archived articles are not products purchased by acquisitions librarians: they are papers given away by researchers, anarchically, and in parallel. Hence from the survey's "Share of Preference model" it is impossible to draw any conclusions about self-archiving causing cancellations by librarians, because the librarians were never asked what they would cancel, under what conditions; just what hypothetical products they would prefer over what. And of course they would prefer lower-priced, immediate products over higher-priced, delayed products! But if all articles in all journals were self-archived, the "Share of Preference model" does not give us the slightest clue about what journals librarians would acquire or cancel. Nor does it give us a clue as to what they would do between now (c. 15% self-archiving) and then (100% self-archiving). The banal fact that everyone would rather have something for free rather than paying for it certainly does not answer this question, or fill the gaping evidential gap about the existence, size, or timing of any hypothetical effect of self-archiving on cancellations. Nor does the study's one nontrivial finding: that librarians don't much care about the difference between a refereed author's draft and a published-PDF. (Let us hope that this study will be the last futile attempt to treat research as if it were done in order to generate or protect journal revenues. Even if valid evidence should eventually emerge that OA self-archiving does cause journal cancellations, it would be for the publishing community to adapt to that new reality, not for the research community to abstain from it, and its obvious benefits to research, researchers, their institutions, their funders, and the tax-paying public that funds the funders and for whose benefit the research is conducted.)

Self-Archiving and Journal Subscriptions:
Critique of Publishing Research Consortium Study

Stevan Harnad
The following is a critique of:

Chris Beckett and Simon Inger, Self-Archiving and Journal Subscriptions: Co-existence or Competition? An international Survey of Librarians' Preferences. Commissioned by the Publishing Research Consortium from Scholarly Information Strategies Ltd (SIS), a scholarly publishing consultancy. October 2006

Because there has so far been no detectable correlation between author self-archiving and journal cancellations, the Publishing Research Consortium commissioned a survey of acquisition librarians' preferences and attitudes about a number of hypothetical alternatives. From the responses a theoretical model was constructed, which predicted cancellations as more self-archived content becomes available. How did the study arrive at this prediction without any actual cancellation data?

The prediction was based on a rather simple methodological flaw: Librarians were given a series of hypothetical choices, each a choice among three hypothetical "products," A, B and C. The librarians were asked to pick which of the three product options they would prefer most and least. Each hypothetical product option consisted of a complicated combination of six properties out of 3-4 possible values per property.

Presenting this array of hypothetical product options as choices to acquisition librarians (apart from being highly complicated and highly hypothetical, with many hidden assumptions) is specious, for among the potential properties of the hypothetical "product" options was the property that some of the options were free.

But a free self-archived journal article is not a product: It is not something that an acquisitions librarian decides whether or not to acquire. Open Access (OA) is not a product-acquisition issue at all: At best (or worst) its a product cancellation issue.

Hence the only credible and direct hypothetical question one could have asked librarians about self-archived journal articles (and even then there would be no guarantee that librarians would actually do as they predicted they would do under the hypothetical conditions) would be about the circumstances under which they think they would cancel existing journals:

"Would you cancel journal X if 100% of its articles were accessible free online (80%? 60%? 40%?)? If they were accessible immediately (after 6 months? 12? 24?)?"

And even that question is laden with highly speculative and even indeterminate assumptions: How could librarians (or anyone) know what percentage of a journal was accessible for free, self-archived, for any particular journal?

And what about interactions between journal X and journal Y? (How to spend a given acquisitions budget -- what to acquire and what to cancel -- is presumably a comparative decision, and we are asking about the keep/cancel trade-offs.)

But what if 60% of all journals were free online (immediately? after 12 months?)? (Acquisition/cancellation decisions today are largely competitive ones: X gets cancelled in favour of Y. The rules of this trade-off game would presumably change if all journals were roughly on a par for their percentage of freely available online content or the length of the delay before it is freely available.)

Straightforward questions on what a librarian predicts they would cancel (in favour of what) under what hypothetical conditions (and how those conditions could be ascertained) might possibly have some weak predictive value. But such straightforward questions are not what this series of questions about preferences among hypothetical "product options" asked.

[Even straightforward hypothetical answers to straightforward hypothetical questions may not have any predictive value if the hypotheses are far-fetched or unfamiliar enough, if they have hidden or incoherent assumptions: I frankly don't believe there is a librarian alive who has a clue as to what they would keep or cancel if the self-archived versions of all journal articles were suddenly available free online today -- let alone what they would do as all journal contents gradually approached 100% availability, at various (uncertain) speeds, from a trajectory of increasing (but uncertain) free content (40% to 60% to 80%) and/or decreasing delay (24 months to 12 months to 6 months).]

And that's without mentioning intangibles such as any continuing demand for the paper edition, etc., nor how librarians could know the percentages available, how quickly the percentages would grow, and at what relative rate they would grow among more and less important journals, more and less expensive journals.

But it was not even these straightforward, if highly speculative, questions that were asked of librarians in this survey. Instead, they were asked to pick the most and least favoured option among three hypothetical "products," A, B and C, with a variety of complicated combinations of 6 hypothetical properties, which could each take 3-4 values:

      1. ACCESS DELAY: 24-months, 12-months, 6-months, immediate access
      2. PERCENTAGE OF JOURNAL'S CONTENT: 100%, 80%, 60%, 40%
      3. COST: 100%, 50%, 25%, 0%
      4. VERSION: preprint, refereed, refereed+copy-edited, published-PDF;
      5. ACCESS RELIABILITY: high, medium, low
      6. JOURNAL QUALITY: high, medium, low

In each case, products A, B and C were given some combination of the values on properties 1-6, and the librarian had to choose which of the 3 combinations they most and least preferred.

From samples of these combinations (interpolated and extrapolated within and between librarians) the survey concludes that:

PRC: A major study of librarian purchasing preferences has shown that librarians will show a strong inclination towards the acquisition [sic] of Open Access (OA) materials as they discover that more and more learned material has become available in institutional repositories.

(1) OA materials are not "acquired" (and it is both misleading and absurd to cast either the questions or the responses in an acquisitions context). Non-OA products are acquired, and the availability of OA versions of them might or might not induce cancellation in favour of other non-OA products under various circumstances (that are not even touched upon by this study or its methodology).

Why would the model assume arbitrary differential rates of OA growth among journals rather than roughly uniform growth across all journals in each field (apart form random fluctuations)? And if there were systematic differential OA growth within a field, wouldn't librarians' decisions depend very much on the field, and on which journal contents happen to became OA faster, rather than on any general predictions generated from this theoretical model?

(2) Nothing whatsoever was determined about what happens as more and more OA becomes available all round, nor about how availability would be ascertained, nor at what rate OA would grow and be ascertained. There were merely static questions about 3 hypothetical competing "products," some stipulated to be PP% OA within MM months.

PRC: Overall the survey shows that a significant number of librarians are likely to substitute OA materials for subscribed resources, given certain levels of reliability, peer review and currency of the information available. This last factor is a critical one -- resources become much less favoured if they are embargoed for a significant length of time.

The survey shows nothing whatsoever about libraries substituting OA material for anything, because free self-archived content is not something a subscriber institution (library) provides (by buying it in) but something an author institution provides, via its IR, by self-archiving it.

If the questions had been forthrightly put as pertaining to cancellation decisions under various hypothetical conditions, then at least we would have had librarians' speculations about what they think they would cancel under those hypothetical conditions. But instead we have inferences from a model based on least- and most-preferred "product" options having little or no bearing on any question other than the librarians' preferences for the hypothetical properties: They prefer journals with lower prices, whose content is higher quality, more reliable, more immediate, peer-reviewed, and preferably 100% of it. (Librarians don't much care whether the peer-reviewed article is the author's final draft or the publisher's PDF, as long as it's peer-reviewed: That is a genuine finding of this study!)

There is no way at all to interpolate or extrapolate from data like these to draw valid or even coherent conclusions about self-archiving and cancellations, with or without a "conjoint analysis" model.

PRC: One of the key benefits of the conjoint analysis approach used in this survey was the removal of bias by not referring, when testing different product configurations, to any named incarnations of content types, including subscription journals, licensed full-text (or aggregated) databases, or articles on OA repositories.

This "bias" was eliminated at the cost of making it a questionnaire about acquisitions among a variety of competing "products" when it should have been a questionnaire about cancellations under a variety of hypothetical OA conditions (many of them unascertainable, hence moot).

PRC: The survey tested librarians' preferences for a series of hypothetical and unnamed products frequently showing unfamiliar combinations of attributes -- such as a fully priced journal embargoed for 24 months, or content at 25% of the price but through an unreliable service. By taking this approach, the survey measured librarians' preferences for an abstract set of potential products thus avoiding any pre-conceived preferences for named products, such as journals, licensed full- text (aggregated) databases or content on OA repositories.

Indeed. But OA is not an alternative product for acquisition: it is a property that might or might not induce cancellation in favor of other products under certain hypothetical (and presumably competitive) conditions.

PRC: The data were abstracted into a "Share of Preference" model (or simulator) which has then been used to model real-life products and thus create predictions for librarians' real-life preferences for these products. It is therefore possible to go beyond the comparisons, in this work, of journals versus OA and to model other preferences, such as between OA and licensed full-text databases.

The "Share of Preference model" might be viable when the preference really concerns competing products for acquisition, with a variety of rival properties, but it fails completely when applied to free non-products, not for acquisition at all, but treated as if they were just another among the rival properties of products competing for acquisition.

We could have said a-priori that librarians (like all consumers) will prefer a higher quality product over a lower quality product, 100% of a product over 60% of a product, an immediate product over a delayed product, a lower-priced product over a higher-priced product. A "Share of Preference model" could give some rough rank orders for those various combinations.

It seems natural to add to such a "Share of Preference model" that consumers will prefer a free product over a priced product, except that we are talking here about acquisitions librarians, who do not "acquire" free products but merely buy or cancel priced journals. This study simply does not and cannot indicate under what OA conditions they will cancel what for what.

The following (mild) conclusions, are the only ones that can be drawn:

PRC: There is a strong preference for content that has undergone peer review.

Yes, and librarians don't much care whether the peer-reviewed content is the publisher's PDF version or the author's final version -- except that the publisher's PDF is for sale and the author's final draft is not! Nor does the model tell us under what conditions, if both versions are available for a journal X, librarians would cancel the publisher's PDF (and in favour of what journal Y?). The question is never even raised. That's the question the study was designed to answer, but the method could not answer it. The survey might as well have asked the librarians directly, for X/Y pairs of hypothetical or actual journals -- rather than A/B/C triplets of hypothetical "products" -- banal questions such as:

"If 100% of X were immediately available for free online and Y was not, and your users needed X and Y equally, and you could not afford both, and you currently subscribed to X and not to Y, would you cancel X for Y?"

I suspect that it is because -- in the absence of any actual evidence of self-archiving causing cancellations -- a survey on hypothetical cancellations of journal X in favour of journal Y (or no journal at all) under various %OA and months-delay conditions would not have been very convincing or informative that the survey instead resorted to "Share of Preference" modelling. But I'm afraid the outcome is even less convincing.

PRC: How soon content is made available is a key determinant of content model preference in librarian's acquisition behaviour; delay in availability reduces the attractiveness of a product offering.

Yes, immediate access is preferable to delayed access. And, no doubt, if/when librarians are ever inclined to cancel a journal X because PP% of its articles are freely available, they are more likely to do so if that PP% is immediately available than if it is only available 24 months after publication. But we could have guessed that without this study. The question is: Under what circumstances are librarians going to cancel what, when? This study does not and cannot tell us. Relative preference models can only tell us that they are more likely to do it under these conditions than under those conditions (and we already knew all that).

Having said all this, it is important to state clearly that, although there is still no evidence at all of self-archiving causing cancellations, it is possible, indeed probable, that self-archiving will cause some cancellations, eventually. No one knows (1) how soon it will cause cancellations, nor (2) how many cancellations it will cause. That all depends on (a) how much demand there still is for the print edition and (b) for the journal's online edition at that time, (c) for how long that demand lasts, and (d) how quickly self-archiving grows and approaches 100%. (Perhaps someone should do a survey on people's predictions about those factors!)

But regardless of any of this -- and regardless also of the validity or invalidity of the present survey -- the possibility or probability of cancellation pressure is most definitely not the basis on which the research community should decide whether or not to self-archive and whether or not to mandate self-archiving. That decision must be based entirely on the benefits of OA self-archiving for research access, impact, productivity and progress -- definitely not on the basis of the possibility of revenue losses for publishers.

We do well to remind ourselves that these questions are not primarily about what is or is not good for the publishing industry. They are about what is and is not good for research, researchers, their institutions, their funders, and the tax-paying public that funds the funders. Research is supported and conducted and peer-reviewed and published for the sake of research progress and applications, not in order to support the publishing industry, or to protect it from risk.

And what is certain is that peer-reviewed research publishing can and will successfully adapt to Open Access: How can it fail to do so, when it is researchers who conduct the research, write the articles, perform the peer review, read, use, apply and cite the research, and, now, provide online access to it as well? Publishers are performing a valuable service (in implementing the peer review and in providing a paper and online edition) but it is publishing that must adapt to what is best for research in the online age, definitely not research that must adapt to what is best for publishing. And publishing can and will adapt.

Berners-Lee, T., De Roure, D., Harnad, S. and Shadbolt, N. (2005) Journal publishing and author self-archiving: Peaceful Co-Existence and Fruitful Collaboration

Henneken, E. A., Kurtz, M. J., Warner, S., Ginsparg, P., Eichhorn, G., Accomazzi, A., Grant, C. S., Thompson, D., Bohlen, E. and Murray, S. S. (2006) E-prints and Journal Articles in Astronomy: a Productive Co-existence (submitted to Learned Publishing)

(I might add that Dr. Alma Swan is not the super-ennuated (sic) Proustian personage repeatedly cited in this PRC survey, but the cygnine author of a number of landmark surveys, one of them reporting the only existing evidence -- negative -- for a causal connection between OA self-archiving and cancellations.)

Swan, A. (2005) Open access self-archiving: An Introduction. JISC Technical Report.

On Thu, 16 Nov 2006, Simon Inger and Chris Beckett replied:

1. The methodology deployed and the entire point of conducting a conjoint survey at all:

We decided to undertake a conjoint survey because we felt that other attitudinal surveys of what future intentions might be were highly prone to being bogged down exactly because surveyees were asked in absolute terms to what extent they would like one scenario, and then another, without ever asking them to choose between them.

Simon and Chris are, I think, quite right that there is considerable danger of bias, in one direction or the other, when acquisitions librarians are asked to speculate about what they would do in hypothetical future scenarios.

But it is not at all clear that the method Simon and Chris used corrects for these biases, or merely changes the subject (from predicting cancellations under hypothetical conditions, to merely expressing product/property preferences under hypothetical conditions).

A survey that asks people if they like steak to eat, and then asks if they like chicken to eat, is not as powerful as a survey that asks them to choose between steak and chicken. Bring in another variable, such as, "how well done do you like your meat?" and you get a very different answer depending on whether the surveyee preferred steak or chicken in the first place. By combining these factors with others through a conjoint survey, you might just find out how bad the steak has to be before chicken tartare starts to command a market share! We hope this illustrates the whole purpose of the conjoint in applying it to the situation that publishing currently faces; it forces people to reveal the true underlying factors in their decision-making in a way that hasn't been done before.

The conjoint method is no doubt a good method for estimating or ranking relative product property preferences in general. But in the particular case of library journal acquisitions/cancellations, OA and self-archiving, as noted, the method not only does not remedy the the possibility of bias, but it bypasses the question of cancellations altogether -- the question that I take it that (for lack of actual cancellation data) the survey was trying to answer.

2. Whether or not OA can be considered a product in any meaningful sense:

Can articles in Open Access repositories be considered a product and one that librarians may select instead of journals? Absolutely they can. Is the issue here that they are free via OA, or that they are not organised and packaged? If we were to stand on a street corner and give away mobile phones, they would be every bit as much as a product as one you paid for in a shop. Would we cause some people not to go into the shop and buy a mobile - sure we would. Would some people not trust the mobile we gave them and buy one anyway - yes they would. Would some people use our mobiles as a spare and buy another anyway - yes they would do that too. A survey might tell you in what proportions people would undertake these actions. But you can be certain that at least some of the people would use the mobile we gave them and postpone or cancel the acquisition of a paid-for phone. So we believe that articles via OA, even though they are free, are still very much a product. So perhaps they should not be considered as a product because they are not organised into product-shaped offerings, like journals are.

I'm afraid I cannot agree with this reasoning: The mobile phone analogy (as well as the meat analogy) begs the question, because in both cases the product and the client are unambiguous, and it is a straightforward quid pro quo: Would the client rather buy steak or chicken? mobile phone or home phone? The choice is a direct trade-off between (two) competing products. And I also agree that if one of them were free, that would not change anything: It would still be this versus that.

But that's not at all how it is with paid journals vs. self-archived OA content.

Let's start with an easy example: Suppose we weren't talking about anarchically self-archived articles, but about OA vs. non-OA journals. And to make it even simpler, let us suppose (as is the case with, for example, with BioMed Central journal institutional "memberships"), that a library has a choice between two journals that are equated, somehow, in terms of readership, quality, subject-matter and usage-needs of institutional users, that there is only enough money to afford one of them, and that they differ in that one is subscription-based and the other is based on institutional "membership" fees (for publishing institutional articles).

That's an odd choice situation for an acquisitions librarian (since in one case the librarian is buying in the journal's content, and in the other the librarian is paying for the institution's own outgoing content), but perhaps librarians would intuit that they get better value for their institutional money from the second journal (especially if they consult with their institutional users, and they agree -- a detail not mentioned by the survey, which seems to assume subscription/cancellation decisions are all or mostly in the hands of the librarians!).

But that would be a prima-facie plausible prediction by librarians, about what they would prefer and do under those conditions. Even more plausible would be a least/most choice involving three equivalent journals, when the library can afford only two journals, and the third is an OA journal for which someone else (other than the library) pays the institutional OA charges, making it effectively "free" to the library. Under those conditions the librarian could realistically say they'd prefer to "cancel" the free (OA) journal (i.e., just let users download it for themselves, free, from the web) so they can use all available money saved for the other two journals.

(Of course, the tricky part is that a pure OA journal [e.g., BMC or PLoS] is not one that a library subscribes to anyway! (Actually, most OA journals are available for subscription, and do not charge author-institutions for publication. Possibly, just possibly, the results of the PRC survey might have some predictive value as to whether that kind of OA journal is likely to be cancelled; but so far there is little actual evidence of that happening either, though it might! Keep your eyes on the longevity of the majority of the OA journals in DOAJ that do not change for publication but make ends meet from subscriptions.)

But we have not yet come to third option, the one that the survey was commissioned by PRC to test, and that is author self-archiving, and whether that will cause cancellations.

It is for author self-archiving that the question of the extra properties of percentage content, and length of embargo had to be introduced and varied in this study. Length of embargo is not the problem, but percentage content very much is, and so is the fact that all self-archived content is free. Here we are square in the middle of the profound difference between OA journals (a complete, quid-pro-quo product) and OA self-archiving (an anarchic process, applying to only a portion of content, and an unknown proportion at that, growing -- but again at an unknown rate -- across time).

With journals (including OA journals), it's journal X vs journal Y ("product" X vs. "product" Y): Shall I purchase X and cancel Y, or vice versa? Shall I purchase X and Y and cancel Z? These are presumably familiar, hence realistic acquisitions librarian questions (in consultation with users -- who were not surveyed in this survey!).

But what is the question with journals vs. anarchic self-archived content? What is it that a librarian is contemplating buying versus cancelling when what they are really faced with is a choice between a journal and a distributed, anarchic and uncertain percentage of its contents (with no indication of how it is even knowable what that percentage is)?

But let's overlook that and agree that if it were a question of buying vs. cancelling journal X based on some estimate of the percentage of its contents that is available for free in self-archived form, librarians could dream up a hypothetical preference from a combination of properties such as journal quality, journal price, percentage free content, and embargo length.

But that would be journal X vs. not-X, or journal X vs. Y. What is the librarian's conjecture as to their preference when all journals have PP% of their content self-archived? That's not a journal vs. journal acquisition/cancellation question any more: It's asking librarians to second-guess the OA future: Are we to infer from the conjoint preference data that they would cancel all journals under those conditions (second-guessing their users on how long they might, for example, continue to value the paper edition?).

The analogy with chicken and steak would be whether conjoint chicken/steak or mobile/home-phone property preferences predict whether and when people would stop paying for food or phones altogether because they were somehow miraculously available free with a certain probability (and/or) delay) for a certain percentage of the potential calls and time. We know that if it were all free, immediately and with certainty, everyone would prefer that. But do conjoint preferences tell us one bit more than that? (And again we leave out the parties of the second part -- the institutional users - as well as the paper edition and how they might feel about it, and for how long...)

That may be so, for now, but at the same time we are aware of organisations that are building products which combine the power of OAI-PMH (and the crawling power of Google); existing abstracting & indexing databases; publisher operated link servers; and library operated link servers: to build an organised route to OA materials - a route that would allow a non-subscriber of a journal article to be directed to the free OA repository version instead. Once these products exist we are sure our research indicates that some librarians at least will actually switch to OA versions for some of their information needs, while others will continue to purchase the journal product for a whole raft of reasons and others will provide, i.e. acquire, both options.

Let me quickly agree about what I would not have contested from the very outset:

(1) Without the conjoint survey, I would already have agreed that everyone prefers to have something for free rather than paying for it.

(2) I also happen to believe, personally, that once 100% OA self-archiving has been reached -- but I don't know how soon it will be reached, nor how soon after it is reached this will happen -- there will be cancellation pressure that will lead to downsizing and a transition to OA publishing.

But it is still a fact that there is as yet no evidence of cancellation pressure, and I do not at all see how the conjoint preference study tells us any more than we already know (and don't know) about whether and when and how much cancellation pressure will ever be caused by self-archiving.

(I have to add that I profoundly doubt that in the OA world libraries and librarians will mediate in any way between users and the refereed journal article literature. Library mediation will be as supererogatory as it is with what users do with google today.)

3. The issue of bias:

The whole Open Access debate evokes an emotional response from publishers, librarians and researchers on both sides of the debate. At the same time, so does the word "cancellation". For that matter, so does the phrase "serials crisis". We wanted to avoid using all of these phrases in the research so as not to cloud people's judgement in favour of their beliefs alone. This is one way of avoiding one type of bias. Specifically the type of bias we sought to eliminate was an emotional bias, not a bias for or against OA per se. It can be equally well argued that another survey should be done with these words actually mentioned. The results may well be different. But no more or less valid than ours - such a survey would be measuring a different thing. It is up to each individual reader of the report to decide which kind of response and hence survey they would prefer.

I think the attempt to avoid all of these emotional (and notional) biases was a commendable one, and it would have been successful too, if the conjoint-preference method had been amenable to analysing the anarchic phenomenon of author self-archiving and its likely effect on librarian acquisition/cancellation. But it is not, because anarchic, blanket self-archiving is simply not an acquisition/cancellation matter.

Acquisition/cancellation concerns what to buy, retain and cancel from among a finite set of products using a finite acquisitions budget. It is a competitive matter: competition between products. Anarchic self-archiving is gradual and uncertain, but it generates only an all-or-none cancellation question, and one that is in no way addressed by the conjoint preferences method.

(I am sure, by the way, that librarians could have been polled -- directly and unemotionally -- about how much journal content they thought would have to be self-archived before they would no longer need to purchase journals at all -- but I don't think their speculations on that would have been very informative.)

I do think, though, that one indirect finding on this question did emerge from the conjoint method (and it surprised me, considering how strident some librarians have been in the opposite direction in the past!): It does seem that librarians are surprisingly indifferent to the difference between an author's refereed final draft and the publisher's PDF. That's very interesting (and it's progress: in librarian awareness and understanding of what researchers really do and don't need!).

4. The statement of apparently obvious or banal findings:

The critique states that some of the findings are obvious and banal. "The fact that everyone would like something for free rather than paying for it", for example. In fact the survey shows that not everyone would prefer that. Even in a completely like for like situation. Possibly because people are suspicious of free things.

Agreed. (But that's hardly very surprising either! Nor informative about whether and when self-archiving causes cancellations.)

Much more important, however, is how the decision becomes qualified by other factors - and to what extent they are qualified. (Would you like free raw chicken for dinner or paid-for cooked chicken?) Look closely and the results show that the lure of "free" has only so much pulling power, and a combination of other factors pull more potently against it. So in themselves the importance of each of the attributes has limited value - it is in combination that their true meaning comes through.

I think what you are saying here is that in varying the combination of 6 properties, each with 3-4 possible values, you founded a complex preferential structure. But it still doesn't tell us whether and when self-archiving will cause cancellations.

5. The validity of inferring cancellation behaviour from the findings:

So, can we infer cancellation behaviour from the results? Yes, we can. Because it is unrealistic to expect that everyone that expresses a preference for acquiring a product that looks very much like content on OA repositories would still continue to acquire a paid-for version. Some will, of that we have very little doubt. But likewise some won't. To that end I think we can infer cancellation will occur. It may be after someone has provided an organisational layer on top of the repositories. It may be after improved librarian awareness of the alternative has occurred. And it may require way more than 15% of the material to be available on OA.

For those (like me) who happen to think that 100% OA self-archiving is likely eventually to cause cancellations, downsizing, and a transition to the OA cost-recovery, but that there is as yet no evidence of this, and that it is a matter of complete uncertainty how fast the self-archiving will grow, how soon the cancellation pressure will be felt, and how strong the cancellation pressure will be -- this study did not provide any new information.

For those empiricists (with whom I have some sympathy too), who simply say there is no evidence at all yet that self-archiving causes cancellations -- and that even in the few fields where self-archiving has been at or near 100% for some years there is still no such evidence -- it is likewise true that this study has not provided any new evidence: neither about whether there will be cancellations, nor, if so, about when and how much.

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Methodology at 00:52 | Comment (1) | Trackbacks (0)

Saturday, October 14. 2006

The Special Case of Astronomy

Update Jan 1, 2010: See Gargouri, Y; C Hajjem, V Larivière, Y Gingras, L Carr,T Brody & S Harnad (2010) “Open Access, Whether Self-Selected or Mandated, Increases Citation Impact, Especially for Higher Quality Research”
Update Feb 8, 2010: See also "Open Access: Self-Selected, Mandated & Random; Answers & Questions"

SUMMARY: Astronomy is unusual among research fields in that all research-active astronomers already have full online access to all relevant journal articles via institutional subscriptions (because astronomy has only a small closed circle of core journals). Many astronomy articles are also self-archived as preprints prior to peer review and publication, but usage all shifts to the published version as soon as it is available. Self-archiving, even where it is at or near 100%, has no effect at all on subscriptions or cancellations. The Open Access (OA) citation advantage hence reduces to merely an "Early Access Advantage" in astronomy, because all postprints are accessible to everyone. There is also the much-reported positive correlation between the citation counts of articles and the proportion of them that were self-archived. This is no doubt partly a self-selection effect or "Quality Bias" -- with the better articles more likely to be self-archived. But this is unlikely to be all or most of the source of the OA advantage even in astronomy -- let alone in most other fields, where the postprints are not all accessible to all active researchers. The most important component of the OA advantage in general is that OA removes the access and usage barriers that prevent the better work from having its full potential impact (Quality Advantage). In astronomy, where those access barriers hardly exist, there is still a measurable OA advantage, but mostly just because of Early Advantage (and self-selection). With all postprints accessible, Competitive Advantage is restricted to the prepublication phase; Usage Advantage (downloads) can be estimated: downloads are doubled by universal online accessibility. And the Quality Advantage no doubt persists (though it is difficult to estimate independently).

Michael Kurtz (Harvard-Smithsonian Center for Astrophysics) has provided some (as always) very interesting and informative data on the special case of research access and self-archiving practices in Astronomy. His data show that:

      (1) In astronomy, where all active, publishing researchers already have online access to all relevant journal articles (a very special case!), researchers all use the versions "eprinted" (self-archived) in Arxiv first, because those are available first; but they all switch to using the journal version, instead of the self-archived one, as soon as the journal version is available.

That is interesting, but hardly surprising, in view of the very special conditions of astronomy: If I only had access to a self-archived preprint or postprint first, I'd used that, faute de mieux. And as soon as the official journal version was accessible -- assuming that it's equally accessible -- I'd use that.

But these conditions -- (i) open accessibility of the eprint before publication, (ii) in one longstanding central repository (Arxiv), for many and in some cases most papers, and (iii) open accessibility of the journal version of all papers upon publication -- is simply not representative of most other fields! In most other fields, (i') only about 15% of papers are available early as preprints or postprints, (ii') they are self-archived in distributed IRs and websites, not one central one (Arxiv), and (iii') the journal versions of many papers are not accessible at all to many of the researchers after publication.

That's a very different state of affairs (outside astronomy and some areas of physics).

      (2) Kurtz's data showing that astronomy journals are not cancelled despite 100% OA are very interesting, but they too follow almost tautologically from (1): If virtually all researchers have access to the journal version, and virtually all of them prefer to use that rather than the eprint, it stands to reason that it is not being cancelled! (What is cause and what is effect there is another question -- i.e., whether preference is driving subscriptions or subscriptions are driving preference.)

      (3) In astronomy, as indicated by Kurtz, there is a small, closed circle of core journals, and all active researchers worldwide already have access to all of them. But in many other fields there is not a closed circle of core journals, and/or not all researchers have access. Hence access to a small set of core journals is not a precondition for being an active researcher in many fields -- which does not mean that lacking that access does not weaken the research (and that is the point!).

      (4) I agree completely that there is a component of self-selection Quality Bias (QB) in the correlation between self-archiving and citations. The question is (4a) how much of the higher citation count for self-archived articles is due to QB (as opposed to Early Advantage, Competitive Advantage, Quality Advantage, Usage Advantage, and Arxiv (Central) Bias)? And (4b) does self-selection QB itself have any causal consequences (or are authors doing it purely superstitiously, since it is has no causal effects at all)? The effects of course need not be felt in citations; they could be felt in downloads (usage) or in other measures of impact (co-citations, influence on research direction, funding, fame, etc.).

The most important thing to bear in mind is that it would be absurd to imagine that somehow OA guarantees a quality-blind linear increment to the usage of any article, regardless of its quality. It is virtually certain that OA will benefit the better articles more, because they are more worth using and trying to build upon, hence more handicapped by access-barriers (which do exist in fields other than astro). That's QA, not QB. No amount of accessibility will help unciteable papers get used and cited. And most papers are uncited, hence probably unciteable, no matter how visible and accessible you may try to make them!

      (5) I think we agree that the basic challenge in assessing causality here is that we have a positive correlation (between proportion of papers self-archived and citation-counts) but we need to analyze the direction of the causation. The fact that more-cited papers tend to be self-archived more, and less-cited papers less is merely a restatement of the correlation, not a causal analysis of it: The citations, after all, come after the self-archiving, not before!

The only methodologically irreproachable way to test causality would be to randomly choose a (sufficiently large, diverse, and representative) sample of N papers at the time of acceptance for publication (postprints -- no previous preprint self-archiving) and randomly impose self-archiving on N/2 of them, and not on the other N/2. That way we have random selection and not self-selection. Then we count citations for about 2-3 years, for all the papers, and compare them.

No one will do that study, but an approximation to it can be done (and we are doing it) by comparing (a) citation counts for papers that are self-archived in IRs that have a self-archiving mandate with (b) citation counts for papers in IRs without mandates and with (c) papers (in the same journal and year) that are not self-archived.

Not a perfect method, problems with small Ns, short available time-windows, and admixtures of self-selection and imposed self-archiving even with mandates -- but an approximation nonetheless. And other metrics -- downloads, co-citations, hub/authority scores, endogamy scores, growth-rates, funding, etc. -- can be used to triangulate and disambiguate. Stay tuned.

Now some comments:

On Tue, 10 Oct 2006, Michael Kurtz wrote:

"Recently Stevan has copied me on two sets of correspondance concerning the OA citation advantage; I thought I would just briefly respond to both.

"Besides our IPM article: http://adsabs.harvard.edu/abs/2005IPM....41.1395K we have recently published two short papers, both with graphs you might find interesting.

"The preprint will appear in Learned Publishing http://adsabs.harvard.edu/abs/2006cs........9126H E-prints and Journal Articles in Astronomy: a Productive Co-existence

"and this is in the J. Electronic Publishing http://adsabs.harvard.edu/abs/2006JEPub...9....2H Effect of E-printing on Citation Rates in Astronomy and Physics

"There is a point I would like to emphasize from these papers. Figure 2 of the Learned Publishing paper shows that the number of ADS users who read the preprint version once the paper has been released drops to near zero. This shows that essentially every astronomer has subscriptions to the main journals, as ADS treats both the arXiv links and the links to the journals equally; also it shows that astronomers prefer the journals."

And it also shows how anomalous Astronomy is, compared to other fields, where it is certainly not true that every researcher has subscriptions to the main journals...

"Figure 5 of the J Electronic Publishing paper also shows that there is no effect of cost on the OA reads (and thus by extension citation) differential. Note in the plot that there is no change in slope for the obsolescence function of the reads (either of preprinted or non-preprinted) at 36 months. At 36 months the 3 year moving wall allows the papers to be accessed by everyone, this shows clearly that there is no cost effect portion of the OA differential in astronomy. This confirms the conclusion of my IPM article."

And it underscores again, how unrepresentative astronomy is of research as a whole.

"Citations are probably the least sensitive measure to see the effects of OA. This is because one must be able to read the core journals in order to write a paper which will be published by them. It is really not possible for a person who has not been regularly reading journal articles on, say, nuclear physics, to suddenly be able to write one, and cite the OA articles which enabled that writing. It takes some time for a body of authors who did not previously have access to form and write acceptable papers."

In astronomy -- where the core journals are few and a closed circle, and all active researchers have access to them. But this is not true of research as a whole, across disciplines (or around the world). Researchers in most fields are no doubt handicapped for having less than full access, but that does not prevent them from doing and publishing research altogether.

"Any statistical analysis of the causal/bias distinction must take into account the actual distribution of citations among articles. This is why I made the monte carlo analysis in the IPM paper. As a quick example for papers published in the Astrophysical Journal in 2003: The most cited 10% have 39% of all citations, and are 96% in the arXiv; the lowest cited 10% have 0.7% of all citations and are 29% in the arXiv. Showing the causal hypothesis is true will be very difficult under these conditions."

(i) Since all of the published postprints in all these journals are accessible to all research-active astronomers as of their date of publication, we are of necessity speaking here mostly about an Early Access effect (preprints). Most of the other components of the Open Access Advantage (Competitive Advantage, Usage Advantage, Quality Advantage) are minimized here by the fact that everything in astronomy is OA from the date of publication onward. The remaining components are either Arxiv-specific (the Arxiv Bias -- the tradition of archiving and hence searching in one central repository) or self-selection [Quality Bias] influencing who does and does not self-archive early, with their prepublication preprint.

Since most fields don't post pre-refereeing preprints at all, this comparison is mostly moot. For most fields, the question about citation advantage concerns the postprint only, and as of the date of acceptance for publication, not before.

(ii) In other fields too, there is the very same correlation between citation counts and percentage self-archived, but it is based on postprints, self-archived at publication, not pre-refereeing preprints self-archived much earlier. And, most important, it is not true in these fields that the postprint is accessible to all researchers via subscription: Many potential users cannot access the article at all if it is not self-archived -- and that is the main basis for the OA impact advantage.

"Perhaps the journal which is most sensitive to cancellations due to OA archiving is Nuclear Physics B; it is 100% in arXiv, and is very expensive. I have several times seen librarians say that they would like to cancel it. One effect of OA on Nuclear Physics B is that its impact factor (as we measure it, I assume ISI gets the same thing) has gone up, just as we show in the J E Pub paper for Physical Review D. Whether Nuclear Physics B has been cancelled more than Nuclear Physics A or Physics Letters B must be well known at Elsevier."

It is an interesting question whether NPB is being cancelled, but if it is, it clearly is not because of self-archiving, nor because of astronomy's special "universal paid OA" OA to the published version: if NPB is being cancelled, it is for the usual reason, which is that it is not good enough to justify its share of the institution's journal budget.

Harnad, S. (2005) OA Impact Advantage = EA + (AA) + (QB) + QA + (CA) + UA

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Methodology at 16:05 | Comments (0) | Trackbacks (0)

Monday, October 9. 2006

Critique of EPS/RIN/RCUK/DTI "Evidence-Based Analysis of Data Concerning Scholarly Journal Publishing"

Update Jan 1, 2010: See Gargouri, Y; C Hajjem, V Larivière, Y Gingras, L Carr,T Brody & S Harnad (2010) “Open Access, Whether Self-Selected or Mandated, Increases Citation Impact, Especially for Higher Quality Research”
Update Feb 8, 2010: See also "Open Access: Self-Selected, Mandated & Random; Answers & Questions"

SUMMARY: This Report on UK Scholarly Journals was commissioned by RIN, RCUK and DTI, and conducted by ELS, but its questions, answers and interpretations are clearly far more concerned with the interests of the publishing lobby than with those of the research community.

The Report's two relevant overall findings are correct and stated very fairly in their summary form:
[1] "Overall, [self-archiving] of articles in open access repositories seems to be associated with both a larger number of citations, and earlier citations for the items deposited....The reasons for this [association] have not been clearly established - there are many factors that influence citation rates... Consistent longitudinal data over a period of years... would fill this gap."

[2] "There is no evidence as yet to demonstrate any relationship (or lack of relationship) between subscription cancellations and repositories... Proving or disproving a [causal] link between availability in self-archived repositories and cancellations will be difficult without long and rigorous research."
The obvious empirical and practical conclusion to draw from the findings -- that (1) all the self-archiving evidence to date is positive for research and that (2) none of the self-archiving evidence to date is negative for publishing -- would have been that the research community should now apply and extend these findings -- by applying and extending self-archiving (through self-archiving mandates) to all UK research output, along with consistent, rigorous longtitudinal studies over a period of years, to test (1) whether the positive effect on citations continues to be present (and why) and (2) whether the negative effect on subscriptions continues to be absent.

But instead, the two overall findings are hedged with volumes of special pleading, based mostly on wishful thinking, to the effect that (1') the observed relationship between self-archiving and citations may not be causal, and that (2') there may exist an as-yet-unobserved causal relationship between self-archiving and cancellations after all.

Even that would be alright, if this Report's conclusions were coupled with a clear endorsement of the proposed self-archiving mandates, so that the competing hypotheses can be put to a rigorous long-term test. But the only test the commissioners of this Report seem to be interested in conducting is "Open Option" publishing, i.e., authors paying publishers to make their article OA for them, instead of self-archiving it for themselves. This would certainly be a nice way to hold author self-archiving and institution/funder self-archiving mandates at bay for a few years more, while at the same time protecting publishers from undemonstrated risk of revenue loss. But it would also leave global unmandated self-archiving to continue to languish at the current spontaneous 15% rate that the self-archiving mandates had been meant to drive up to 100%. And it would leave research unprotected from its demonstrated risk of impact loss. The option of having to pay to provide OA is certainly not likely to enhance the unmandated rate of uptake by authors (though I'm sure publishers would have no quarrel with funder mandates to provide OA coupled with the funds to pay publishers' asking price for paid OA, as provided by the Wellcome Trust).

The longterm test will nevertheless be conducted, because five out of seven UK Research Councils have already mandated self-archiving (an eighth CCLRC, will soon merge with one of the mandators, PPARC). Their citation rates and their cancellation rates can then be compared with those for the two that have not mandated self-archiving (and whose authors hence only do it spontaneously, by "self-selection"). Alas this will be mostly comparing apples and oranges (e.g. MRC vs AHRC), and it will needlessly be depriving the oranges of several more years of potential growth enhancement. My guess is that all the other councils -- except possibly the paradoxical EPSRC (which evidently thinks, with the publishing lobby, that there's still some sort of pertinent pretesting to be done for a few more years here) -- will come to their senses long before that, unpersuaded by Reports like this one.

UK scholarly journals: 2006 baseline report
An evidence-based analysis of data concerning scholarly journal publishing.
Prepared on behalf of the Research Information Network, Research Councils UK and the UK Department of Trade and Industry.
By Electronic Publishing Services Ltd
In association with Professor Charles Oppenheim and LISU at Loughborough University Department of Information Science

This is a rather long and repetitious report, but it does contain a few nuggets. It is obviously biassed, but biassed in a restrained way, meaning it does not really try to conceal its biases, nor does it overstate biassed conclusions. It also (reluctantly, but in most cases candidly) acknowledges its own weaknesses.

(The Report was commissioned by RIN, RCUK and DTI, but it is glaringly obvious that the questions, answers and interpretations have been slanted toward the interests of the publishing lobby rather than those of the research community -- possibly because the research community has no lobby in this matter, apart from the OA movement itself! Nevertheless, there has been considerable circumspectness, at least in the summary and conclusion passages, with weak points and gaps usually pointed out explicitly rather than denied or concealed, and with the overall preoccupation with publishing interests rather than research interests very open too.)

Some quotes and comments:

Whilst some evidence does suggest that [self-archiving in] repositories [is] an important new factor in the journal cancellation decision process, and one which is growing in significance, there is no research reporting actual or even intended journal subscription cancellation as a consequence of the growth of OA self-archived repositories.

So far, this sounds fair and reasonable. (In fact, this is the gist of the Report! The rest is mostly special pleading.)

Subscriptions are reported to have been declining over a period of 10+ years, but for a number of reasons. Proving or disproving a link between availability in self-archived repositories and cancellations will be difficult without long and rigorous research. In this connection, the outcome of research recently announced by the Research Councils UK (RCUK) with the co-operation of Macmillan, Blackwell and Elsevier, will be eagerly awaited, even though a report is not due until late 2008.

With evidence of self-archiving's benefits to research mounting, and zero evidence yet of any negative effect at all on publisher revenue, publishers nevertheless seem quite willing to wait (and keep research waiting too), trying to fend off self-archiving and its potential benefits to research for a long time to come yet, in order to keep trying to find some evidence of negative causal effects on publisher revenue (or, failing that, to deny positive causal effects on research impact).

Note that whereas a link between OA self-archiving and subscription decline has not yet been "proved or disproved" (not for want of looking!) -- and it is for that reason that we are hearing these calls for "long and rigorous research" -- the vast preponderance of the evidence we do have has already "proved" a "link" between OA self-archiving and citation counts (a link that is almost certainly causal, despite the wishful thinking of some who have a vested interest in its all turning out to be merely a-causal self-selection and superstition on the part of authors).

The question that the research community accordingly needs to ask itself is whether self-archiving's evidence-based benefits to research should be held in abeyance still longer, and meanwhile interpreted by default as a-causal, in order to buy still more time to try to "prove/disprove" hypothetical subscription declines for which there is no evidence whatsoever to date, even in fields where self-archiving has been near 100% for years.

(Researchers should also go on to ask themselves whether the research benefits should be held in abeyance even if they are causally linked to a subscription decline: Is research impact to be sacrificed in the service of publisher revenue? Are we conducting and funding research in order to generate -- or to safeguard -- publisher revenue?)

There is no evidence as yet to demonstrate any relationship (or lack of relationship) between subscription cancellations and repositories. Work in this field would need sufficient, representative and balanced samples, and the collaboration of all stakeholders, including especially research institutions and publishers. Any such study will need to be maintained over a fairly extended period, with regular reports, since it seems likely that the position could change with time if the contents of self-archiving repositories become progressively more comprehensive.

This would be fine, if proposed as an extended research project to be conducted after self-archiving mandates are in place, to analyze their long-term effects on subscriptions.

But this would be an exceedingly self-serving suggestion on the part of the publishing community (and a methodologically empty one) if meant as a "pilot" study that must somehow be conducted before adopting self-archiving mandates. (And it would be exceedingly self-defeating of the research community to even consider accepting such a pre-emptive suggestion as a precondition, before adopting self-archiving mandates.)

There is some consistency in results that show more citations for articles self-archived in repositories as distinct from the same or similar articles available [only via journal] subscription (although there have also been a few contradictory results). Overall, deposit of articles in open access repositories seems to be associated with both a larger number of citations, and earlier citations for the items deposited.

This a fair summary -- except that immediately after stating it, this "association" is about to be deconstructed (much as the "association" between cigarette-smoking and lung cancer was deconstructed for years and years by the tobacco industry, claiming that only correlation had been demonstrated, and not causation). Read on:

The reasons for this [association] have not been clearly established - there are many factors that influence citation rates, including the reputation of the author, the subject-matter of the article, the self-citation rate, and, of course, how important or influential the repository is in its own right. The little existing evidence suggests that a possible [sic] reason for increased citation counts is not that the materials were free, or that they appeared more rapidly, but that authors put their best work into OA format. This research was limited to one discipline, however [astronomy], and more extensive evidence is required to validate this finding.

This (important) study by Kurtz et al in astronomy, however, is not what the vast majority of the evidence (no longer little!) shows: Moreover, as noted, this a-causal interpretation -- only one of the possible interpretations of the astronomy evidence -- also happens to be the interpretation that the publishing community prefers for all the self-archiving evidence, in all fields. The alternative interpretation is that the relationship is causal: that the OA advantage is not merely an arbitrary whim on the part of the better authors to make their work OA, to no causal effect at all (why on earth would they be doing it at all then?): They do it because making their work more accessible increases its accessibility, uptake, downloads, usage, applications, citations, impact -- exactly as the correlational evidence shows, without exception, in field after field.

(NB: The only methodologically unexceptionable way to demonstrate causation here, by the way, is to select a large enough random sample of articles, divide them in half randomly, mandate half of them to be self-archived and half not, and then compare their respective citation counts after a few years. No one is likely to do quite that study -- any more than it was likely that a large random sample of people would be divided in half randomly, with half mandated to smoke and half not! But we are in the process of doing an approximation to that causal study, by comparing the citation counts of articles in the IRs of the (few) institutions that have already mandated self-archiving with the average for other articles in the same journals/years in which those articles appeared, but that have not been self-archived; we will also compare the size of the OA advantage for mandated and comparable non-mandated self-archiving. [We do not believe for a moment that these data are necessary to demonstrate causation, as causation is a virtual certainty anyway, but we are ready to play the game, in order to try to cut short the absurd delay in doing the obvious: mandating self-archiving universally.])

Although quite a lot of evidence has been collected regarding the quantitative effect of OA on citation counts (whether in the form of OA journals or as self-archived articles), much of it is scattered, uses inconsistent methods and covers different subject areas.

Yet, despite this scatter, methodological inconsistency and diversity, virtually all of it keeps showing exactly the same consistent pattern: A citation (and download) advantage for the OA articles. (No amount of special pleading can make that stubborn pattern go away!)

Consistent longitudinal data over a period of years to measure IF trends in a representative range of journals would fill this gap

There is no gap! There is a growing body of studies, across all fields and all journals, that keeps showing exactly the same thing: the OA advantage (in article citations and article downloads: this is not about journal impact factors, especially because comparing different journals is comparing apples and oranges).

(There seems to be a confusion here between the existence of the correlation itself, between self-archiving and citation count counts -- this is found consistently, over and over -- and the question of the causal relation, which will not be answered by longtitudinal data (we have longtitudinal data already!) but by comparing mandated and unmandated self-archiving: if they both show the OA advantage, then the effect is causal and self-selection bias is a minor component.)

e.g., studying a range of journals that were toll-access and went OA (or vice versa). In the short-term, more data in different disciplines measuring the impact on citation counts of articles in hybrid journals or articles that are available in both forms versus articles that are only available in one of the forms will improve the evidence base.

No, the question about the reality and causality of the OA advantage will not be settled by OA journal vs. non-OA journal comparisons; that can always be dismissed as comparing apples with oranges, and, failing that, can always be attributed to self-selection bias (i.e., choosing to publish one's better work in an OA journal)!

And if we wait for the uptake of hybrid Open Choice -- i.e., paying the journal to self-archive the published PDF for you -- these "longtitudinal" studies are likely to take till doomsday (and any positive outcome can still be dismissed as self-selection bias in any case!).

What is needed is precisely the data already being gathered, on huge samples, across all disciplines, comparing citation counts for self-archived versus non-self-archived articles within the same journal and year. The result has been a consistent, high OA Advantage (which has elicited a lot of special pleading about causality).

So we will look at the mandated subset of the self-archived papers, to try to show that the OA advantage is not (only, or mostly) a self-selection effect (Quality Bias [QB]).

(There is undoubtedly a non-zero self-selection [QB] component in the OA advantage, but there are many other components as well, including a Quality Advantage [QA], an Early Access Advantage [EA], a Competitive Advantage [CA, which will, like QB, vanish once all articles are OA], and a Usage (Download) Advantage [UA]. At 100% OA, there will no longer be any QB or CA (or Arxiv Advantage [AA]), but EA, QA and UA will still be going strong. EA and UA components have already been confirmed by the Kurtz study in astronomy. QA is implied by the repeated finding of a positive correlation between citation count and the proportion of those articles with that citation count that are OA. The mandate study will try to show that this correlation is causal, i.e., QA, not QB.)

Harnad, S. (2005) OA Impact Advantage = EA + (AA) + (QB) + QA + (CA) + UA.
The whole area of the relationship between citation counts and scholarly communication channels is confused because of problems associated with quality bias [QB] (e.g., if scholars tend to self-archive only their best work, as suggested by Kurtz et al. [in astronomy]; alternatively, it may be that only the best journals are OA). In other words, differences in citation counts and IFs may simply reflect the quality of the materials under study rather than having anything to do with the channel by which the material is made available.

First, the issue is article citation counts, not journal Impact Factors (IFs).

Second, this is all special pleading. The biggest OA effects are based on comparing articles within the same journal/year. The size of the effect is indeed correlated with the quality of the article, because no amount of accessibility will generate citations for bad articles, whereas good articles benefit the most from a level playing field, with all affordability/accessibility barriers removed: that is the Quality Advantage [QA]. The idea that the Quality Advantage is merely a Quality (Self-Selection) Bias [QB], i.e., that the advantage is merely correlational, not causal, is of course a logical possibility, but it is also highly improbable (and would imply that accessibility/affordability barriers count for nothing in usage and citations, and that the better work is being made OA by its authors for purely superstitious reasons, because doing so has no effect at all!).

Overall, we concur with Craig's introduction that "the problems with measuring and quantifying an Open Access advantage are significant. Articles cannot be OA and non-OA at the same time."

They need not be. It is sufficient if we take a large enough sample of articles that are OA and non-OA from the same journals and years. Randomly imposing the self-archiving would be the only way to equate them completely (and our ongoing study on mandated self-archiving will approximate this).

(The analysis by Craig, commissioned by Blackwell Publishing, has not, so far as I know, been published.)

"Further, the variation of citation counts between articles can be extremely high, so making controlled comparisons of OA vs. non-OA articles nigh on impossible" [Craig, Blackwell Publishing]

(The way Analysis of Variance works is to compare variation between and within putatively different populations, to determine the probability that they are in reality the same population. The published comparisons show that the OA/non-OA differences are highly significant, despite the high variance.)

It would of course be absurd to try to compare citation counts for OA and non-OA articles having the same citation counts. But we can compare OA and non-OA article counts among articles having the same citation counts, in the same journals -- and what we find is a strong positive correlation between the citation count and the proportion of articles that are OA (just as Lawrence reported in 2001, but not only in computer science, but across all 12 disciplines studies so far, and with much bigger sample sizes):

Source 4.8: Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4) pp. 39-47.

Note that the appendix to the Report under discussion here, states, in connection with the above study, which it cites:

"Harnad is THE advocate of OA and, thus, whilst expert in the field, is inevitably biased."

There is a bit of irony in the fact that in connection with another of the studies it cites:

Source 4.9: Harnad, S, Brody, T, Oppenheim, C et al, Comparing the impact of open access versus non open access articles in the same journals, D-Lib Magazine, 10,(6), 2004,

the appendix of the Report goes on to say:

"Harnad is THE exponent of OA, but, thus, potentially less objective."

Ironic (or, shall we say, conflicted, since this Report aspires to be a neutral one as between the interests of the research community and the publisher community), because the sole named collaborator on the Report is also a co-author of the above-cited study!

Let us agree that we all have views on the underlying issues, but that reliable data speak for themselves, qua data, and our data (and those of others) keep showing the same consistent OA Advantage. The disagreement is only on the interpretation: whether or not the consistent correlations are causal. And here, allegiances are tugging on both sides: Those favouring causality tend to come from the research community, those favouring a-causality tend to come from the publishing community. (Let us hope that the data from mandated self-archiving will soon settle the matter objectively.)

"[since] any Open Access advantage appears to be partly [sic] dependent on self-selection, the more articles that are {self-}archived... you'd expect to see any Open Access advantage reduce." [Craig, Blackwell Publishing]

Note that Craig carefully says "partly" -- and that we agree that self-selection is one of the many potential contributors to the OA advantage.

We also agree, of course, that once 100% OA is reached, the OA citation advantage -- in the form of an advantage of OA over concurrent non-OA articles -- will be reduced: indeed it will vanish! With all articles OA, there can no longer be either a Competitive Advantage [CA] or a Self-Selection Advantage (Quality Bias, QB) of OA over (non-existent) non-OA.

But the Quality Advantage [QA] will remain. (Higher quality articles will be used and cited more than they would have been if they had not been OA: this is not a competitive advantage but an absolute one.) And the Early Advantage [EA] as well as the Usage (Download) Advantage [UA] will remain too (as already shown by Kurtz's findings in Astronomy).

"Authors self-archiving in the expectant belief that each and every paper they archive will receive an Open Access advantage of several hundred percent are going to be sorely disappointed." [Craig, Blackwell Publishing]

This too is correct, but who on earth thought that OA would guarantee that all work would be used, whether or not it was any good? OA levels the playing field so merit can rise to the top, unconstrained by accessibility or affordability handicaps. But bad remains bad, and let's hope that researchers will continue to avoid trying to build on weak or invalid findings, whether or not they are OA.

The OA advantage is an average effect, not an automatic bonus for each and every OA article; moreover, the OA advantage is highly correlated with quality: The higher the quality, the higher the advantage. It is this effect that is open to the a-causal interpretation that the Quality Advantage [QA] is merely a Quality Bias [QB] (Self-Selection). But, equally (and, in my view, far more plausibly) it is open to the causal interpretation that OA causes wider usage and citation precisely because it removes all accessibility/affordability constraints that are currently limiting uptake and usage. That does not mean everything will be used more, regardless of quality ("usefulness"): But it will allow users (who are quite capable of exercising self-selection too!) to access and use the better work, selectively.

In addition, since the distribution of citations is not gaussian -- a small percentage of articles receives most of the citations and more than half of articles receive no citations at all -- it is almost axiomatic that the OA advantage will be strongest in the high-quality range

Finally, it is worth noting that all researchers in the field are agreed that if the vast majority of scholarly publications become available in OA form, no citation advantage to OA will be measurable.

It is a tautology that with 100% OA, the OA/NOA ratio is undefined! But EA will still be directly measurable, and it will be possible to infer UA and QA indirectly (UA by comparing downloads for articles of the same age, before and after OA for the same articles, and QA by doing the same with citations; the Kurtz study used such methods in Astronomy. But by that time (100% OA), not many people will still have any interest in the a-causal hypothesis.

Thus, what OA advantage there is will prove to be temporary if OA does become the standard mode of publication.

This, however, is simply incorrect. At 100% OA, the Competitive Advantage (CA) will be gone; the Self-Selection Advantage (Quality Bias, QB) will be gone; the method of comparing citation counts for OA and non-OA articles within the same journal and year will be gone. So much is true by definition.

But (as Kurtz has shown in Astronomy), the Early Advantage and the Usage Advantage will still be there. And the Quality Advantage, will still be there too; and that was what this was all about: Not just a horse-race for who can make his articles OA first, so as to reap the competitive advantage before 100% OA is reached (though that's not a bad idea!); not a guarantee that, no matter how bad your work, you can increase your citations by making them OA; but a guarantor that with access-barriers removed, quality will have the best chance to have its full potential impact, to the benefit of research productivity and progress itself, as well as the authors, institutions and funders of the high quality work.

(There is a bit of a [lurid] analogy here with saying that if only we can get everyone to smoke, it will be clear that smoking has no differential effects on human health! Perhaps the converse is a better way to look at it: if only we could get everyone to stop smoking, smoking will no longer have a differential effect on human health!)

(PS: OA is not a "mode of publication": OA publication is a mode of publication. OA itself is a mode of access-provision, which can be done in two ways, via OA publication or via OA self-archiving of non-OA publications.)

Self archived articles

It is this area that has been most studied, with numerous key publications. Most of these are focussed on the citation advantage of self-archived articles rather than of OA journals. Craig, in an as yet unpublished review, provides an excellent overview of the evidence collected to date. Lawrence (Source 4.13) is significant because it was the first major paper that identified a citation advantage for OA self-archived articles, and it has been widely cited ever since. However, it was based on a too small-scale a study to support general conclusions. Harnad et al. (Source 4.9) provides a useful summary of the state of play of OA advantage studies, while Hajjem et al. (Source 4.8 ) is fairly typical of the many articles produced by Harnad claiming that self-archiving leads to higher citation counts.

Let us be clear: The many OA vs. non-OA studies, ours and everyone else's, across more than a dozen different disciplines, many of them based on large-scale samples, all show the very same consistent pattern of positive correlation between OA and citation counts. Those are data, and they are not under dispute. The only "claim" under dispute is that that consistent correlation is causal...

Antelman (Source 4.1) is arguably the most carefully constructed study of the question. Articles in four disciplines were evaluated, and in each case it was found that open access articles had greater citation counts than non-open access articles.

One wonders why this particular small-scale study (of about 2000 articles in 4 fields) was singled out, but in any event, it shows exactly the same pattern as all the other studies (some of them based on hundreds of thousands of articles instead of just a few thousand, in three times as many fields).

Eysenbach challenges the notion that OA "green" articles (i.e., those in repositories) are more effective than OA "gold" (i.e., those published in OA journals, such as those produced by Public Library of Science) in obtaining high citation counts. It is this part of his paper that produced a furious response from Harnad, much of it focused on particular details.

The issue was not about OA green (self-archived) articles producing higher citation counts than OA gold (OA-journal)! No one had claimed one form of OA was more effective than the other in generating the OA Advantage before the Eysenbach study: It was Eysenbach who claimed to have shown gold was more effective than green -- indeed that green was only marginally effective at all!

And I think anyone reading the exchanges will see that all the fury is on the Eysenbach side. All I do is point out (rather patiently) where Eysenbach is overstating or misstating his case:

Harnad, S. (2006)PLoS, Pipe-Dreams and Peccadillos PLoS Biology eletters (May 16, 2006) [1] [2] [3] [4]

Eysenbach's study does find the OA advantage, as many others before it did. It certainly doesn't show that the gold OA advantage is bigger than the green OA advantage, in general. It simply shows that for the 1500-article sample in the one journal tested, Proceedings of the National Academy of Sciences (PNAS), a very high impact journal, both paid OA (gold) and green OA (free) increased citation counts over non-OA, but gold increased them more than green. That result is undisputed. Its extrapolation to other journals is:

The likely explanation of the PNAS result is very simple: PNAS is not a randomly chosen, representative journal: it is a very high-impact, very high visibility, interdisciplinary journal, one of very few like it (along with Nature and Science). Articles that pay for OA are immediately accessible at PNAS's own high-visibility website -- a website that probably has higher visibility than any single institution's IR today. So PNAS articles made freely accessible at PNAS's website get a bigger OA advantage than PNAS articles made made freely accessible by being self-archived in the author's own IR.

The reason it definitely does not follow from this that gold OA is bigger than green OA is very simple: Most journals are not PNAS, and do not have the visibility or average impact of PNAS articles! Hence Eysenbach's valid finding for one very high-impact journal simple does not generalize to all, most, or even many journals. Hence it is not a gold/green effect at all, but merely a very high-end special case.

Apart from the spurious gold/green advantage, Eysenbach did confirm, yet again, (1) the OA advantage itself, and confirmed it (2) within a very short time range. These are both very welcome results (but not warranting to be touted, as they were, by both the author and by the accompanying PLoS editorial, as either the first "solid evidence" of the OA advantage -- they certainly were not that -- or a demonstration that gold OA generates more citations than green OA: the very same method has to be tried on middle and low-ranking journals too, before drawing that conclusion!). (Nor are the PLoS/PNAS results any more exempt from the methodological possibility of self-selection bias [QB] than any of the many prior demonstrations of the OA advantage, as authors self-choose to pay PNAS for gold OA as surely as they self-choose to self-archive for green OA!)

The fury on Eysenbach's part came from my pointing out that his and PLoS's claim to primacy for demonstrating the OA advantage (and their claim of having demonstrated a general gold-over-green advantage) was unfounded (and might have been due to both PLoS's and Eysenbach's zeal to promote publication in gold journals: Eysenbach is the editor of one too, but not a high-end one like PNAS or PLoS): Eysenbach's was just the latest in a long (and welcome) series of confirmations of the OA advantage (beginning with Lawrence 2001), the prior ones having been based on far larger samples of articles, journals and fields (and there was no demonstration at all of a general gold over green advantage: just the one non-representative, hence non-generalisable special case of PNAS).

Both authors believe that OA produces a citation advantage, but Eysenbach has presented evidence that casts doubt on Harnad's notion that the "green" route is the preferred route to getting that increased impact.

Green may not be the preferred route to OA for editors of gold journals, but it is certainly the preferred route for the vast majority of authors, who either have no suitable gold journal to publish in, or lack the funds (or the desire) to pay the journal to do what they can do for free for themselves. The only case in which paid gold OA may bring even more citations than free green OA (even though both increase citations) is in the very highest quality journals, such as PNAS, today -- but that high-end reasoning certainly does not generalise to most journals, by definition. (And it will vanish completely when OA self-archiving is mandated, and the harvested IR contents become the locus classicus to access the literature for those whose institutions are not subscribed to the journal in which a particular article appeared -- whether or not it is a high-end journal.)

(There is also a conflation of the (less interesting) question of (1) whether green or gold generates a greater OA citation advantage [answer, for high-end journals like PNAS, gold does, but in general there is no difference] with the (far more important) question of (2) whether green or gold can generate more OA [answer: green can generate far more OA, far more quickly and easily, not just because it does not cost the author/institution anything, but because it can be mandated without needing either to find the extra funds to pay for it or to constrain the author's choice of which journal to publish in].

However, despite the intuitive attractiveness of the hypothesis that OA will lead to increased citations because of easier availability, the one systematic study of the reasons for the increased citations - by Kurtz (Source 4.12) - showed that in the field of astronomy at least, the primary reason was not that the materials were free, or that they appeared more rapidly, but that authors put their best work into OA format, and this was the reason for increased citation counts.

Astronomy is an interesting but anomalous field: It differs from most other fields in that:

(1) Astronomy consists of a small, closed circle of journals.

(2) Virtually all research-active astronomers (so I am told by the author) have institutional access to all those journals.

(3) For a number of years now, that full institutional access has been online access.

(4) So astronomy is effectively a 100% OA field.

(5) Hence the only room left for a directly measurable OA advantage in astronomy is (5a) to self-archive the paper earlier (at the preprint stage) [EA] or (5b) to self-archive it in Arxiv (which has evolved into a common central port of call, so it generates more downloads and citations -- mostly at the preprint stage, in astronomy).

(6) What Kurtz found, was that under these conditions, higher quality (higher citation-count) papers were more likely to be self-archived.

(7) This might be a quality self-selection effect (QB) (or it might not), but it is clearly occurring under very special conditions, in a 100% OA field.

(8) Kurtz did make another, surprising finding, which has bearing on the question of how much of a citation advantage remains once a field has reached 100% OA.

(9) By counting citations for comparable articles before and after the transition to 100% OA, Kurtz found that the citations per article had actually gone down (slightly) rather than up, with 100% OA.

(10) But a little reflection suggests a likely explanation: This slight drop is probably a shift in balance with a level playing field:

(11) With 100% OA (i.e., equal access to everything), authors don't cite more articles, they cite more selectively, able now to focus on the best, most relevant work, and not just on the work their institutions can afford to access.

(12) Higher quality articles get more citations, but lower quality articles of which there are far more (some perhaps previously cited by default, because of accessibility constraints) are cited less.

(13) On balance, total citations are slightly down, on this level playing field, in this special, small, closed-circle field (astronomy), once it reaches 100% OA.

(14) It remains to be seen whether total and average citations go up or down when other fields reach 100% OA.

(15) What Kurtz does report even in astronomy is that although total citations are slightly down, downloads are doubled.

(16) Downloads are correlated with later citations, but perhaps at 100% OA this is either no longer true, or true only for higher quality articles.

Similarly, more carefully conceived work on the impact of both OA journals and self-archiving on the quality of research communications, especially on the peer review system, will be required.

OA journals are peer-reviewed journals: What sort of impact are they feared to have on peer review?

And why on earth would the self-archiving of peer-reviewed, published postprints have any impact on the peer review system? The peers review for free. (Could this be just a veiled repetition of the question about the impact of self-archiving on journal revenues, yet again?)

Recently, the results of a study undertaken by Ware for ALPSP, which were published in March 2006 (Source 1.16, in Area 1), have provided at least some initial data on the question of the possible linkage between the availability of self-archived articles in an OA repository and journal subscription cancellations by libraries...: availability of articles in repositories was cited as either a "very important" or an "important" possible factor in journal cancellation by 54 per cent of respondents, even though ranking fourth after (i) decline of faculty need, (ii) reduced usage, and (iii) price. When respondents were invited to think forward five years, availability in a repository was still fourth-ranking factor, but the relevant percentage had risen to 81. Whilst this is not evidence of actual or even intended cancellation as a consequence of the growth of OA self-archiving repositories, it strongly suggests that such repositories are an important new factor in the decision process, and growing in significance.

Summary: No evidence of cancellations, but speculations by librarians to the effect that their currently fourth-ranking factor in cancellations might possibly become more important in the next five years...

Sounds like sound grounds for fighting self-archiving mandates and trying to deny research the benefit of maximized impact for yet another five years -- if one's primary concern is the possible impact of mandated self-archiving on publishers' revenue streams. But if one's primary concern is with the probable impact of mandated self-archiving on research impact, this sort of far-fetched reasoning has surely earned the right to be ignored by the research community as the self-serving interference in research policy that it surely is.

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Self-Archiving Mandates at 00:30 | Comments (2) | Trackbacks (0)

Monday, March 13. 2006

The Immediate-Deposit/Optional-Access (ID/OA) Mandate: Rationale and Model

EXECUTIVE SUMMARY: Universities and research funders are both invited to use this document to help encourage the adoption of an Open Access Self-Archiving Mandate at their institution. Note that this recommended "Immediate-Deposit & Optional-Access" (IDOA) policy model (also called the "Dual Deposit/Release Strategy") has been specifically formulated to be immune from any delays or embargoes (based on publisher policy or copyright restrictions): The deposit -- of the author's final, peer-reviewed draft of all journal articles, in the author's own Institutional Repository (IR) -- is required immediately upon acceptance for publication, with no delays or exceptions. But whether access to that deposit is immediately set to Open Access or provisionally set to Closed Access (with only the metadata, but not the full-text, accessible webwide) is left up to the author, with only a strong recommendation to set access as Open Access as soon as possible (immediately wherever possible, and otherwise preferably with a maximal embargo cap at 6 months).

This IDOA policy is greatly preferable to, and far more effective than a policy that allows delayed deposit (embargo) or opt-out as determined by publisher policy or copyright restrictions. The restrictions apply only to the access-setting, not to the deposit, which must be immediate. Closed Access deposit is purely an institution-internal book-keeping matter, with the institution's own assets, and no publisher policy or copyright restriction applies to it.

[In the meanwhile, if there needs to be an embargo period, the IR software has a semi-automated EMAIL EPRINT REQUEST button that allows any would-be user to request (by entering their email address and clicking) and then allows any author to provide (by simply clicking on a URL that appears in the eprint request received by email) a single copy of the deposited draft, by email, on an individual basis (a practice that falls fully under Fair Use). This provides almost-immediate, almost-Open Access to tide over research usage needs during any Closed Access period.]

1. Research Accessibility

1.1 There exist 24,000 peer-reviewed journals (and conference proceedings) publishing 2.5 million articles per year, across all disciplines, languages and nations.

1.2 No university anywhere, not even the richest, can afford to subscribe to all or most of the journals that its researchers may need to use

1.3 Hence no article is accessible to all of its potential users, and hence all articles are losing some of their research impact (usage and citations).

2. Research Impact: Usage and Citations

2.1 This is confirmed by recent findings, independently replicated by many investigators, showing that articles for which their authors have supplemented subscription-based access to the publisher’s version by self-archiving their own final drafts free for all on the web are downloaded and cited twice as much across all 12 scientific, biological, social science and humanities disciplines analysed so far. (Note: there are no discipline differences in benefits of self-archiving, only in awareness.)

2.2 The total citation counts for articles submitted to the UK Research Assessment Exercise (RAE) are also very closely correlated with departmental RAE rankings (despite the fact that citations are not directly counted by RAE). More citations mean higher RAE ranking.

2.3 Hence citation counts are (i) robust indicators of research performance, (ii) they are not currently maximised for those articles that are not self-archived and (iii) those articles that are being self-archived have a substantial competitive advantage over those that are not.

3. University Self-Archiving Mandates Maximise Research Impact

3.1 Only 15% of the 2.5 million articles published annually are being spontaneously self-archived worldwide today.

3.2 Creating an Institutional Repository (IR) and encouraging staff to self-archive their articles therein is a good first step, but it is not sufficient to raise the self-archiving rate appreciably above the 15% baseline for spontaneous self-archiving.

3.3 Adding library help to encourage and assist staff to self-archive raises the self-archiving rate somewhat, but insufficiently.

3.4 The correct measure of institutional success in self-archiving is the ratio of annual self-archived articles in an institution’s IR relative to that institution’s total annual article output.

3.5 The only institutions that are reliably approaching a 100% annual self-archiving rate today are those that not only create an IR (3.2) and provide library help (3.3) for depositing, but also adopt a self-archiving policy requirement (or "mandate").

3.6 A self-archiving mandate is a simple and natural extension of universities’ already existing mandate to publish research findings (“publish or perish”); it is already linked to incentives by the fact that staff are promoted and funded on the basis of research performance indicators, of which citation impact is a prominent correlate, as in the RAE (2.2).

3.7 Two international, cross-disciplinary JISC surveys have found that 95% of authors will comply with a self-archiving mandate (81% willingly, 14% reluctantly).

3.8 The four institutions worldwide that have adopted a self-archiving mandate to date (CERN in Switzerland, Queensland University of Technology in Australia, Minho University in Portugal, and the ECS Department at University of Southampton) have each confirmed the outcome of the JISC author surveys (3.7), with their institutional self-archiving rates reliably climbing toward 100%,whereas institutions without mandates remain at the 15% spontaneous self-archiving baseline rate

.4. Action: This university should now mandate self-archiving university-wide

4.1 This university should now maximise its own research impact and set an example for the rest of the world by adopting a self-archiving mandate university-wide.

4.2 As indicated by the JISC survey and the empirical experience of the other 3 mandating institutions (3.8): there is no need for any penalties for non-compliance with the mandate; the mandate (and its own rewards: enhanced research access and impact) will take care of itself.

4.31 What needs to be mandated:
Immediate Deposit and Optional Access (IDOA):

- immediately upon acceptance for publication
- deposit in the university’s Institutional Repository
- the author’s final accepted draft (not the publisher’s proprietary PDF)
- both its full-text and its bibliographic metadata (author, date, title, journal, etc.)
(Note that only the depositing itself needs to be mandated. Setting the access privileges to the full-text can be left up to the author, with Open Access strongly encouraged, but not mandated. This makes the university’s self-archiving mandate completely independent of publishers’ self-archiving policies.)

4.32 The Eprints software allows authors to choose to set access as Open Access (OA) or Restricted Access (RA):
OA: both metadata and full-text are made visible and accessible to all would-be users web-wide
RA: metadata are visible and accessible web-wide but full-text is not
4.4 The decision as to whether to set full-text access as OA or RA can be left up to the author; 93% of authors will set full-text access as OA (4.2); for the remaining 7%, the Eprints software still makes it possible for any would-be user web-wide to request an eprint of the full-text automatically by email -- by just cut-pasting their own email address into a box and clicking; the author immediately receives the request and can instantly email the eprint with one click. The result will be 100% access to all Southampton research output, 93% immediately and directly, with one keystroke, 7% indirectly after a short delay, with a few extra keystrokes by user and author.

5. The Importance of Prompt Action

5.1 Self-archiving is effortless, taking only a few minutes and a few keystrokes; library help is available too (but hardly necessary).

5.2 The university should not delay in adopting a self-archiving mandate: 100% OA is both optimal and inevitable -- for research, researchers, their universities, their funders, and the tax-paying public that supports both the research and the universities. It will also give this university a strong competitive impact advantage over later adopters.

5.3 An early adopter not only provides a model for the world with its university-wide self-archiving policy but at the same stroke it maximizes its own research impact and research impact ranking.

5.4 The mandate need have no penalties or sanctions in order to be successful; it need only be formally adopted, with the support of Heads of Schools, the library, and computing services. The rest will take care of itself naturally of its own accord, as the experience of Southampton ECS, Minho, QUT and CERN has already demonstrated.

APPENDIX:
Southampton University Resources for Supporting Open Access Worldwide

A1 U. Southampton ECS department was the first department or institution in the world to adopt a self-archiving mandate (2001).

A2 ECS hosts Psycprints (1991), BBSPrints (1994), Open Journals (1995), OpCit (1996), CogPrints (1997); the American Scientist Open Access Forum (1998).

A3 ECS designed the first and most widely used software for creating institutional archives (GNU Eprints, 2000), now already used by about 200 institutions worldwide; ECS also created Citebase (2002), the citation-based OA search engine (well before Google Scholar).

A4 ECS conducted many of the seminal studies empirically demonstrating the citation impact advantage of self-archiving across all disciplines; ECS also maintains the growing and widely used bibliography of the accumulating findings on the OA Impact Advantage.

A5 ECS/Eprints maintains ROAR, the Registry of Open Access Repositories, tracking the number, size and growth of IRs and their contents worldwide.

A6 ECS/Eprints maintains ROARMAP, the Registry of Open Access Repository Material Archiving Policies, tracking the institutions worldwide that have adopted self-archiving policies, from recommendations to full mandates.

A7 ECS/Eprints maintains the ROMEO Directory of Journal Policies on Author Self-Archiving: 93% of the nearly 9000 journals registered to date (including all the principal publishers and the core ISI journals) have already formally endorsed author self-archiving; only 7% of journals have not.

A8 ECS/Southampton successfully lobbied the UK Parliamentary Select Committee in 2004 to mandate self-archiving; this led directly to the RCUK self-archiving mandate proposal, the Berlin 3 Policy Recommendation (formulated at Southampton) and the development of RAE submission mechanisms for the world’s two principal IR softwares (GNU Eprints, and MIT’s Dspace, both written by Southampton’s Rob Tansley).

Posted by Stevan Harnad in Self-Archiving Mandates at 01:09 | Comments (0) | Trackback (1)

Sunday, February 5. 2006

Open Access vs. Back Access

On 4-Feb-06, at 5:41 PM, Sally Morris (ALPSP) wrote in the AmSci Forum:

"In addition to self-archived papers and those in full OA journals, don't forget (a) those in hybrid/optional OA journals (which seem to average around 40 articles p.a) and (b) those in 'Delayed OA Journals'. I and others are currently trying to estimate the latter - over 1m articles from HighWire Press publishers alone (and 0.25m from the first 32 ALPSP members to respond to my enquiry...)"

Lower tolls are preferable to higher tolls, shorter embargoes are preferable to longer embargoes, longer temporary access is preferable to shorter temporary access, wider access is preferable to narrower access, but Open Access is still Open Access, which means free, immediate, permanent online access to any would-be user webwide, and not just to those whose institutions can afford the access- tolls of the journal it happens to be published in.

The measure of the percentage of OA is the percentage of current annual article output that is freely accessible online. The rest is merely measuring Back Access (BA). BA is welcome, but it is not OA; and not what the research community wants and needs most today. Research uptake, usage, impact and progress do not derive any benefit whatsoever from embargoes, delaying full access and usage. That is not what research is about, or for.

But this is not the publishing community's problem, at all. As long as a journal is green on immediate self-archiving, it has done all it needs to do for OA at this time (i.e., it has not tried to get in OA's way, and in the way of its benefits to research and researchers). The rest is up to the research community now, and they will take care of it -- and not through spontaneous self-archiving alone (just as they do not publish through spontaneous publishing alone). Systematic Self-Archiving Policy is needed, in the form of self-archiving mandates by researchers' institutions and funders, the other two stake-holders in their joint research output and its impact. Both publishing itself and its citation impact are already linked to professional rewards, in the form of salary, promotion and research funding. A self-archiving mandate need merely be based on that existing contingency, and the existing publish-or- perish mandate, and designed simply to maximize it.

http://www.eprints.org/signup/fulllist.php

Gold OA publishing is a welcome bonus; so is hybrid "open choice" optional gold. BA is welcome too; but it cannot and should not be reckoned as OA, any more than re-runs should be reckoned as fresh movies, hand-me-downs as fresh fashion, or left-overs as fresh fare.

One of the biggest and most important components of the OA impact advantage, especially in fields that have already reached 100% OA, such as astrophysics, is EA (Early Access). One would think that earlier access merely brings earlier impact, not more impact. But Michael Kurtz's data shows that EA not only adds a permanent increment to citation counts, but to their continuing growth rate too. It is as if earlier usage branches early, and the branches keep branching and generating more usage and citations. Of course, this will vary with the uptake-latencies, time-constants and turn-around times of each field, but I doubt that progress in any field benefits from, or is even unaffected by, access delays, any more than it is likely to be immune to publication delays.

If a work is worth publishing today, it is worth accessing today, not just in 6 months, 12 months, or still longer. That is what needs to be counted and tallied if we are tracking the growth of OA today. If we want to maintain a separate tally for BA too, that's fine, but beside the point, because after the fact, insofar as OA and immediate research progress -- research's immediate priority today -- are concerned. BA may be useful to students, teachers and historians, but it is OA that is needed by researchers, today. Researchers are both the providers and the primary users of research: They (and their institutions and funders) are also the ones in the position to provide -- and benefit from -- immediate OA.

Pertinent Prior AmSci Topic Threads:

Nature 10 September on Public Archiving (1998)
E-Biomed: Very important NIH Proposal (1999)
Floyd Bloom's SCIENCE Editorial about NIH/E-biomed
Evolving APS Copyright Policy (American Physical Society)
Nature's vs. Science's Embargo Policy (2000)
AAAS's Response: Too Little, Too Late (2001)
APS copyright policy (2002)
Open Letter to Philip Campbell, Editor, Nature (2003)
Is there any need for a universal Open Access label?
Shulenburger on open access: so NEAR and yet so far
Nature Web Debate on Open Access (2004)
Elsevier Gives Authors Green Light for Open Access Self-Archiving
URGENT support for NIH public access policy
Critique of Stanford/HighWire Press Critique of NIH Proposal
Nature Back-Slides on Self-Archiving [Corrected] (2005)
Please Don't Copy-Cat Clone NIH-12 Non-OA Policy!
Open Access vs. NIH Back Access and Nature's Back-Sliding
Proposed update of BOAI definition of OA: Immediate and Permanent

Stevan Harnad

Posted by Stevan Harnad in Self-Archiving Mandates at 10:39 | Comments (0) | Trackbacks (0)

Friday, October 14. 2005

ALPSP's "Facts About Open Access" Report

On Thu, 13 Oct 2005, Sally Morris (ALPSP) wrote in the SPARC Open Access Forum:

SM: "As far as the 'self-archiving' route to OA is concerned, I must have explained our concern a hundred times; let me spell it out yet again:

"Let us assume that self-archiving mandates become widespread, and that tools such as Google Scholar make self-archived articles as easy to discover as the published versions.

"Then if free substitute versions are available for all or most of the content of a given journal, and if these are used by library patrons in preference to the published version, the rational librarian will not purchase the published version."

But if, as all the studies to date show, library patrons use the licensed library published version for those articles that their libraries can afford, and use the author's self-archived OA version for those they cannot, what is Sally's and ALPSP's rationale for keeping them deprived of the articles their libraries cannot afford? and for keeping the authors of those articles deprived of that usage and impact? Is the rationale that the need to protect publishers' from any possibility of risk of a decline in subscription revenues (for which there does not yet exist even a single shred of evidence today) takes precedence over all these author and user needs -- i.e., over all of these research/researcher needs?

Nor do subscriptions and cancellations depend primarily on the "rational librarian": they depend on their user/author communities, who are not calling for cancellations, but for access to what their libraries cannot afford, and for the impact that their own articles lose, from users at other institutions whose libraries cannot afford the journal they were published in.

SM: "If subscriptions fall dramatically, journals will no longer be viable and will cease publication."

Repeating this "a hundred times" and a hundred times more does not make it one whit more a statement of actual fact, rather than just the counterfactual "if/then" conjecture that it is, and continues to be, with not a shred of evidence to support its "if"-premise

I advise Sally that before the Frankfurt Scientific Publishing Meeting she should attend the STM session at the Frankfurt Book Fair in which Michael Kurtz of astrophysics of Harvard will be presenting the data of Edwin Henneken on the usage of the ADS system by astrophysicists (article forthcoming in D-Lib), showing how they switch from using the preprint to using the publisher's published version as soon as it is available -- except those who cannot afford access, who continue to use the self-archived postprint.

SM: "If journals are no longer there to carry out their current functions (not just the management of peer review, but also selection/refinement/collection of content of particular relevance to a given community of interest) that will be a great loss to scholarship."

So would every other negative if/then counterfactual that I or Sally or Pascal or anyone else could dream up, but that doesn't make their if-premises any truer either, not even after being repeated thousands of times. And the more you keep raising the hypothetical ante, the more ominous it sounds -- without becoming one bit truer.

"Pascal's Wager and Open Access"

So let me say it straight out: All evidence is that what is in the best interests of the research community and what is in the best interests of the publisher community can co-exist peacefully with self-archiving. But if there ever were a conflict of interest, there is no doubt whatsoever about the direction in which it would have to be resolved: the dog (research production), not the tail (research publishing).

Why do not Sally Morris and the ALPSP embrace the many potential new ways to collaborate with and benefit from researcher self-archiving and institutional repositories, instead of fixating so single-mindedly on trying to fend off the optimal and inevitable for as long as possible?

SM: " I do not argue that society or indeed other publishers have any right to continue to perform their current function. I'm just pointing out that they may be unable to do so if self-archiving sweeps the board as some would like it to do. That is why we are urging caution to those who would mandate immediate self-archiving."

Self-archiving mandates are not for "sweeping the board," they are for providing access to those researchers who actually can't afford it today, and thereby providing their lost impact to the research and researchers that are actually losing it today. The sweepingly overboard statements about counterfactual disaster scenarios, in contrast, are coming from those who are trying to protect actual, unchanged publisher revenue streams from counterfactual, hypothetical risk, at the cost of certain and sizeable benefits to research, researchers, their institutions, their funders, and the public that funds their research -- i.e., the canid rather than its queue.

Stevan Harnad

Posted by Stevan Harnad in Self-Archiving Mandates at 01:05 | Comments (0) | Trackbacks (0)

« previous page (Page 2 of 3, totaling 21 entries) » next page

Open Access Archivangelism

by Stevan Harnad

Quicksearch

Saturday, May 26. 2007

Craig et al.'s Review of Studies on the OA Citation Advantage

Thursday, March 15. 2007

Don't Count Your (Golden) Chickens Before Your (Green) Eggs Are Laid

Sunday, January 21. 2007

The Open Access Citation Advantage: Quality Advantage Or Quality Bias?

Monday, November 20. 2006

The Self-Archiving Impact Advantage: Quality Advantage or Quality Bias?

Monday, November 13. 2006

Self-Archiving and Journal Subscriptions: Critique of PRC Study

Saturday, October 14. 2006

The Special Case of Astronomy

Monday, October 9. 2006

Critique of EPS/RIN/RCUK/DTI "Evidence-Based Analysis of Data Concerning Scholarly Journal Publishing"

Monday, March 13. 2006

The Immediate-Deposit/Optional-Access (ID/OA) Mandate: Rationale and Model

Sunday, February 5. 2006

Open Access vs. Back Access

Friday, October 14. 2005

ALPSP's "Facts About Open Access" Report

EnablingOpenScholarship (EOS)

Federal Research Public Access Act (FRPAA)

Alliance for Taxpayer Access (ATA)

Creative Commons License:

Quicksearch

Syndicate This Blog

Materials You Are Invited To Use To Promote OA Self-Archiving:

Archives

Calendar

Categories

Blog Administration

Statistics

Top Referrers

Syndicate This Blog