Open Access Archivangelism

Monday, November 20. 2006

The Self-Archiving Impact Advantage: Quality Advantage or Quality Bias?

Update Jan 1, 2010: See Gargouri, Y; C Hajjem, V Larivi�re, Y Gingras, L Carr,T Brody & S Harnad (2010) �Open Access, Whether Self-Selected or Mandated, Increases Citation Impact, Especially for Higher Quality Research�
Update Feb 8, 2010: See also "Open Access: Self-Selected, Mandated & Random; Answers & Questions"

SUMMARY: In astrophysics, Kurtz found that articles that were self-archived by their authors in Arxiv were downloaded and cited twice as much as those that were not. He traced this enhanced citation impact to two factors: (1) Early Access (EA): The self-archived preprint was accessible earlier than the publisher's version (which is accessible to all research-active astrophysicists as soon as it is published, thanks to Kurtz's ADS system). (Hajjem, however, found that in other fields, which self-archive only published postprints and do have accessibility/affordability problems with the publisher's version, self-archived articles still have enhanced citation impact.) Kurtz's second factor was: (2) Quality Bias (QB), a selective tendency for higher quality articles to be preferentially self-archived by their authors, as inferred from the fact that the proportion of self-archived articles turns out to be higher among the more highly cited articles. (The very same finding is of course equally interpretable as (3) Quality Advantage (QA), a tendency for higher quality articles to benefit more than lower quality articles from being self-archived.) In condensed-matter physics, Moed has confirmed that the impact advantage occurs early (within 1-3 years of publication). After article-age is adjusted to reflect the date of deposit rather than the date of publication, the enhanced impact of self-archived articles is again interpretable as QB, with articles by more highly cited authors (based only on their non-archived articles) tending to be self-archived more. (But since the citation counts for authors and for their articles are correlated, one would expect much the same outcome from QA too.) The only way to test QA vs. QB is to compare the impact of self-selected self-archiving with mandated self-archiving (and no self-archiving). (The outcome is likely to be that both QA and QB contribute, along with EA, to the impact advantage.)

Michael Kurtz's papers have confirmed that in astronomy/astrophysics (astro), articles that have been self-archived -- let's call this "Arxived" to mark it as the special case of depositing in the central Physics Arxiv -- are cited (and downloaded) twice as much as non-Arxived articles. Let's call this the "Arxiv Advantage" (AA).

Henneken, E. A., Kurtz, M. J., Eichhorn, G., Accomazzi, A., Grant, C., Thompson, D., and Murray, S. S. (2006) Effect of E-printing on Citation Rates in Astronomy and Physics. Journal of Electronic Publishing, Vol. 9, No. 2, Summer 2006

Henneken, E. A., Kurtz, M. J., Warner, S., Ginsparg, P., Eichhorn, G., Accomazzi, A., Grant, C. S., Thompson, D., Bohlen, E. and Murray, S. S. (2006) E-prints and Journal Articles in Astronomy: a Productive Co-existence (submitted to Learned Publishing)

Kurtz, M. J., Eichhorn, G., Accomazzi, A., Grant, C. S., Demleitner, M., Murray, S. S. (2005) The Effect of Use and Access on Citations. Information Processing and Management, 41 (6): 1395-1402, December 2005

Kurtz analyzed AA and found that it consisted of at least 2 components:

(1) EARLY ACCESS (EA): There is no detectable AA for old articles in astro: AA occurs while an article is young (1-3 years). Hence astro articles that were made accessible as preprints before publication show more AA: This is the Early Access effect (EA). But EA alone does not explain why AA effects (i.e., enhanced citation counts) persist cumulatively and even keep growing, rather than simply being a phase-advancing of otherwise unenhanced citation counts, in which case simply re-calculating an article's age so as to begin at preprint deposit time instead of publication time should eliminate all AA effects -- which it does not.

(2) QUALITY BIAS (QB): (Kurtz called the second component "Self-Selection Bias" for quality, but I call it self-selection Quality Bias, QB): If we compare articles within roughly the same citation/quality bracket (i.e., articles having the same number of citations), the proportion of Arxived articles becomes higher in the higher citation brackets, especially the top 200 papers. Kurtz interprets this is as resulting from authors preferentially Arxiving their higher-quality preprints (Quality Bias).

Of course the very same outcome is just as readily interpretable as resulting from Quality Advantage (QA) (rather than Quality Bias (QB)): i.e., that the Arxiving benefits better papers more. (Making a low-quality paper more accessible by Arxiving it does not guarantee more citations, whereas making a high-quality paper more accessible is more likely to do so, perhaps roughly in proportion to its higher quality, allowing it to be used and cited more according to its merit, unconstrained by its accessibility/affordability.)

There is no way, on the basis of existing data, to decide between QA and QB. The only way to measure their relative contributions would be to control the self-selection factor: randomly imposing Arxiving on half of an equivalent sample of articles of the same age (from preprinting age to 2-3 years postpublication, reckoning age from deposit date, to control also for age/EA effects), and comparing also with self-selected Arxiving.

We are trying an approximation to this method, using articles deposited in Institutional Repositories of institutions that mandate self-archiving (and comparing their citation counts with those of articles from the same journal/issue that have not been self-archived), but the sample is still small and possibly unrepresentative, with many gaps and other potential liabilities. So a reliable estimate of the relative size of QA and QB still awaits future research, when self-archiving mandates will have become more widely adopted.

Henk Moed's data on Arxiving in Condensed Matter physics (cond-mat) replicates Kurtz's findings in astro (and Davis/Fromerth's, in math):

Moed, H. F. (2006, preprint) The effect of 'Open Access' upon citation impact: An analysis of ArXiv's Condensed Matter Section

Davis, P. M. and Fromerth, M. J. (2007) Does the arXiv lead to higher citations and reduced publisher downloads for mathematics articles? Scientometics, accepted for publication. See critiques: 1, 2.

Moed too has shown that in cond-mat the AA effect (which he calls CID "Citation Impact Differential") occurs early (1-3 years) rather than late (4-6 years), and that there is more Arxiving by authors of higher-quality (based on higher citation counts for their non-Arxived articles) than by lower-quality authors. But this too is just as readily interpretable as the result of QB or QA (or both): We would of course expect a high correlation between an author's individual articles' citation counts and the author's average citation count, whether the author's citation count is based on Arxived or non-Arxived articles. These are not independent variables.

(Less easily interpretable -- but compatible with either QA or QB interpretations -- is Moed's finding of a smaller AA for the "more productive" authors. Moed's explanations in terms of co-authorships between more productive and less productive authors, senior and junior, seem a little complicated.)

The basic question is this: Once the AA has been adjusted for the "head-start" component of the EA (by comparing articles of equal age -- the age of Arxived articles being based on the date of deposit of the preprint rather than the date of publication of the postprint), how big is that adjusted AA, at each article age? For that is the AA without any head-start. Kurtz never thought the EA component was merely a head start, however, for the AA persists and keeps growing, and is present in cumulative citation counts for articles at every age since Arxiving began. This non-EA AA is either QB or QA or both. (It also has an element of Competitive Advantage, CA, which would disappear once everything was self-archived, but let's ignore that for now.)

Harnad, S. (2005) OA Impact Advantage = EA + (AA) + (QB) + QA + (CA) + UA. Preprint.

Moed's analysis, like Kurtz's, cannot decide between QB and QA. The fact that most of the AA comes in an article's first 3 years rather than its second 3 years simply shows that both astro and cond-mat are fast-developing fields. The fact that highly-cited articles (Kurtz) and articles by highly-cited authors (Moed) are more likely to be Arxived certainly does not settle the question of cause and effect: It is just as likely that better articles benefit more from Arxiving (QA) as that better authors/articles tend to Arxive/be-Arxived more (QB).

Nor is Arxiv the only test of the self-archiving Open Access Advantage. (Let's call this OAA, generalizing from the mere Arxiving Advantage, AA): We have found an OAA with much the same profile as the AA in 10 further fields, for articles of all ages (from 1 year old to 10 years old), and as far as we know, with the exception of Economics, these are not fields with a preprinting culture (i.e., they don't self-archive preprublication preprints but only postpublication postprints). Hence the consistent pattern of OAA across all fields and across articles of all ages is very unlikely to have been just a head-start (EA) effect.

Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4) pp. 39-47.

Is the OAA, then, QB or QA (or both)? There is no way to determine this unless the causality is controlled by randomly imposing the self-archiving on a subset of a sufficiently large and representative random sample of articles of all ages (but especially newborn ones) and comparing the effect across time.

In the meantime, here are some factors worth taking into account:

(1) Both astro and and cond-mat are fields where it has been repeatedly claimed that the accessibility/affordability problem for published postprints is either nonexistent (astro) or less pronounced than in other fields. Hence the only scope for an OAA in astro and cond-mat is at the prepublication preprint stage.

(2) In many other fields, however, not only is there no prepublication preprint self-archiving at all, but there is a much larger accessibility/affordability barrier for potential users of the published article. Hence there is far more scope for OAA and especially QA (and CA): Access is a necessary (though not a sufficient) causal precondition for impact (usage and citation).

It is hence a mistake to overgeneralize the phys/math AA findings to OAA in general. We need to wait till we have actual data before we can draw confident conclusions about the degree to which the AA or the OAA are a result of QB or QA or both (and/or other factors, such as CA).

For the time being, I find the hypothesis of a causal QA (plus CA) effect, successfully sought by authors because they are desirous of reaching more users, far more plausible and likely than the hypothesis of an a-causal QB effect in which the best authors are self-archiving merely out of superstition or vanity! (And I suspect the truth is a combination of both QA/CA and QB.)

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Methodology at 17:47 | Comments (0) | Trackbacks (0)

Two Happy Accidents Demonstrate Power of "Eprint Request" Button

Here are two rather remarkable anecdotes about the recently created "EMAIL EPRINT" button that allows any would-be user webwide to email a semi-automatic "eprint request" to the author of any eprint in an IR that has been deposited as "Closed Access" rather than "Open Access" to request an individual copy for personal use. (The author need merely click on an "approval" URL in his email message in order to fulfil the request.)

Two recent "accidents," occurring independently at two different institutions, provide dramatic evidence of the potential power of this feature: The button is intended to tide over researcher usage needs during any embargo interval. As such, it is expected to apply only to a minority of deposits (as the majority of journals already endorse immediate Open Access-setting.

The two accident-anecdotes come from University of Southampton and Universit� du Qu�bec � Montr�al:

Southampton has many IRs: A departmental IR (Department of Electronics and Computer Science) already has an immediate full-text deposit mandate, but the university-wide IR does not yet have a mandate, so it has many deposits for which only the metadata are accessible, many of them deposited via library mediation rather than by the authors themselves. This will soon change to direct author deposit, but meanwhile, "The Button" was implemented, and the result was such a huge flood of eprint requests that the proxy depositors were overwhelmed and the feature quickly had to be turned off!

The Button will of course be restored -- using LDAP to redirect the eprint requests to the authors rather than the library mediators -- but the accident was instructive in revealing the nuclear power of the button! Authors, we expect, will be gratified by the countable measures of interest in their work, and we will make a countable metric out of the number of eprint requests. Authors will be able to opt out of receiving eprint requests -- but we confidently expect that few will choose to do so! (Our confidence is based on many factors, take your pick: (1) Authors' known habit of looking first at the bibliography of any article or book in their field, to see "Do they cite me?" (2) Authors' known habit of googling themselves as well as looking up their own citation-counts in Web of Science and now in Google Scholar. (3) Employers' and funders' growing use of research performance metrics to supplement publication counts in employment, promotion and funding decisions...)

Much the same thing happened at UQaM but this time it was while a new IR was still under construction, and its designers were still just testing out its features with dummy demo papers (some of them real!). "The Button" again unleashed an immediate torrent of eprint requests for the bona fide papers, so the feature had to be (tremulously, but temporarily) disabled!

Caveat Emptor!

Increasing Institutional Repository Content with "email eprint" Button
New Request Copy feature in DSpace

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Institutional Repositories at 13:59 | Comments (0) | Trackbacks (0)

Please Register Your Institutional Repository in ROAR

In order to give everyone a clear update on progress in the growth of Instititutional Repositories (IRs) and in order to encourage others to create IRs, could you please register your IRs in the Registry of Open Access Repositories ROAR.

And if your institution has a self-archiving policy, please register it in ROARMAP.

Before registering your IR in ROAR, please check whether it is already registered! This is also a good time to try some of ROAR's powerful new features for monitoring IR growth.

Posted by Stevan Harnad in Institutional Repositories at 13:47 | Comment (1) | Trackbacks (0)

(Page 1 of 1, totaling 3 entries)

Entries from Monday, November 20. 2006

Monday, November 20. 2006

The Self-Archiving Impact Advantage: Quality Advantage or Quality Bias?

Two Happy Accidents Demonstrate Power of "Eprint Request" Button

Please Register Your Institutional Repository in ROAR

EnablingOpenScholarship (EOS)

Federal Research Public Access Act (FRPAA)

Alliance for Taxpayer Access (ATA)

Creative Commons License:

Quicksearch

Syndicate This Blog

Materials You Are Invited To Use To Promote OA Self-Archiving:

Archives

Calendar

Categories

Blog Administration

Statistics

Top Referrers

Syndicate This Blog