The self-archiving initiative

Freeing the refereed research literature online

Stevan Harnad
Intelligence/Agents/Multimedia Group
Department of Electronics and Computer Science
University of Southampton UK
http://cogsci.soton.ac.uk/~harnad

Unlike the authors of books and magazine articles, who write for royalty or fees, the authors of refereed journal articles write only for 'research impact'. To be cited and built on in the research of others, their findings have to be accessible to their potential users. From the authors' viewpoint, toll-gating access to their findings is as counterproductive as toll-gating access to commercial advertisements.

With the online age, it has at last become possible to free the literature from this unwelcome impediment. Authors need only deposit their refereed articles in 'eprint' archives at their own institutions; these interoperable archives can then all be harvested into a global virtual archive, its full contents freely searchable and accessible online by everyone (see The transition scenario).

Unlike the royalty/fee-based literature, which constitutes the vast majority of the printed word, the special, tiny literature of refereed journal articles is, and always has been, an 'author giveaway'. Researchers never benefited from the fact that people had to pay access tolls to read their papers (as subscriptions, and for the online version, site-licences or pay-per-view). On the contrary, those access barriers represent impact barriers for researchers, whose careers and standing depend largely on the visibility and uptake of their research.

There are currently at least 20,000 refereed journals across all fields of scholarship, publishing more than 2 million refereed articles each year. The amount collectively paid by those of the world's institutions which can afford the tolls for just one of those refereed papers averages $2,000 per paper. In exchange for that fee, that particular paper is accessible to readers at those, and only those, paying institutions.

The research libraries of the world can be divided into the (minority) Harvards and the (majority) Have-nots ó the last by no means limited to the developing world. It is obvious how the Have-nots would benefit from free access to the entire refereed literature, for without it their meagre serials budgets can afford only a pitifully small portion. But not even Harvard can afford access to anywhere near all of the literature (see Association of Research Libraries Statistics). Hence, most refereed articles are inaccessible to most researchers. For the authors, this means that much of their potential impact is lost. And it is solely this curtailed research impact and access that is being purchased by the collective $2,000 outlay per article mentioned above.

This is the way things had to be in the past, when print-on-paper was the only publishing medium, and the sizeable costs of printing and distribution had to be recovered somehow. The new online era may be threatening the majority, royalty/fee-based literature (books, magazine articles) in the form of digital piracy; but for the 'giveaway' research literature, it has at last made it possible to eliminate all those counterproductive access/impact barriers.

Not all costs have vanished, of course. Although the costs of printing and distribution (and their online successors, such as publishers' PDF page-images) are no longer essential ones, the cost of the quality-control and certification that differentiates the refereed literature from an unfiltered, anarchic vanity press still needs to be paid. Paper and PDF files have become mere options, purchasable by those who want and can afford them. Refereeing, however, is essential.

Essential costs of refereeing
Refereeing (peer review) is the system of evaluation and feedback by which expert researchers assure the quality of each others' research findings. Referees' services are donated free to virtually all scientific journals, but there is a real cost to implementing the refereeing procedures, which include archiving submitted papers on a website; selecting appropriate referees; tracking submissions through rounds of review and author revision; making editorial judgments, and so on.

The minimum cost of refereeing has been estimated as $500 per accepted article (see slideshow ), but that figure almost certainly has inessential costs wrapped into it (for example, the creation of the publisher's PDF). I think that the true figure for peer-review implementation alone across all refereed journals probably averages closer to $200 per article, or even lower. Hence, quality-control costs account for only 10% of the collective tolls actually being paid per article.

Can this situation, in which the authors' and referees' giveaways are needlessly being held hostage to obsolete printing costs and cost-recovery methods, be remedied? Note that it is not simply a matter of lowering the financial access barriers: even if those were slashed by 90%, most researchers would still be unable to access most research papers. There is an optimal solution, and it is inevitable: the refereed research literature must be freed online for everyone, everywhere, for ever. The irreducible 10% or so quality-control cost need no longer be paid for by readers' institutions; it can be paid in the form of quality-control service costs, per paper published, by authors' institutions, out of their savings on subscription costs.

Journal publishers certainly will not scale down to becoming only quality-control providers of their own accord. Nor can libraries effect such a transition on their own. And authors cannot and should not be expected to stop submitting their research to established high-quality, high-impact journals in preference for new, alternative journals just because those are prepared to provide stand-alone quality control right now. Journal niches are largely filled already, and immediate careers and standing are far more important to researchers than the potential long-term benefits of risky sacrifices.

But researchers can hasten the optimal and inevitable outcome without any sacrifice or risk. The entire refereed journal literature can be freed, virtually overnight, without authors having to give up their established refereed journals, by a method that a portion of the physics community has already shown to work. These physicists have since 1991 been publicly self-archiving their research papers online ó both before and after refereeing (preprints and postprints) ó in the physics 'eprint archive'. This archive currently holds 150,000 articles. The number of new articles being self-archived there is currently about 30,000 annually, and increasing by some 3,500 papers each year. The archive, with its 14 mirror-sites world-wide, gets about 160,000 user hits each weekday at its US site alone. So there is no doubt that self-archiving is feasible, and that when papers are thus made freely accessible online, they are heavily used.

But although these physicists have shown the way to free the refereed research literature, authors in other disciplines have been slow to realize that the system can work for them too. They have assumed that there must be something unique about physics that makes self-archiving work. This misapprehension has been encouraged by the incorrect impression that the physics archive contains only unrefereed preprints, and that self-archiving somehow compromises the quality control of journals.

Yet absolutely nothing has changed in peer review in physics. The same authors who self-archive continue to submit all their papers to their journals of choice, just as they always did, and virtually all the papers in the archive appear in refereed journals about 12 months after journal submission. The only thing that has changed is that a growing portion of the refereed literature in physics is accessible, free for all, online. Yet even in physics, self-archiving is still growing far too slowly: at the present linear growth rate it will be another decade before the entire physics literature is online and free.

Institution-based self-archiving
There is now a way both to accelerate the rate of self-archiving in physics and to extend the practice to the other disciplines (see The transition scenario). My original 'subversive proposal' to free the refereed literature through author self-archiving fell largely on deaf ears because self-archiving in an anonymous FTP archive or a web home page would be unsearchable, unnavigable, irretrievable and hence unusable. Nor has centralized archiving, even when made available to other disciplines, been catching on fast enough either (it has taken three years for the number of articles in cogprints to reach 1,000).

The new breakthrough is agreement on metadata tagging standards that make the contents of distributed archives interoperable, hence harvestable into one global virtual archive, all papers searchable and retrievable by everyone for free. The open archives initiative (OAI) has now provided the metadata tagging standards and a registry for all OAI-compliant eprint archives. The self-archiving initiative is providing free software for institutions to create OAI-compliant archives, interoperable with all other open archives, ready to be registered and for their contents to be harvested into searchable global archives, interlinked to one another by citations (see http://citebase.eprints.org/cgi-bin/search ).

Distributed, institution-based self-archiving benefits research institutions in three ways. First, it maximizes the visibility and impact of their own refereed research output. Second, by symmetry, it maximizes their researchers' access to the full refereed research output of all other institutions. Third, institutions themselves can hasten the transition to self-archiving and so more quickly reduce their library's annual serials expenditures to 10% (paid to journal publishers for refereeing their submissions).

The institutional library can help researchers to do self-archiving and can maintain the institution's own eprint archives as an outgoing refereed collection for external use, in place of the old incoming collection via subscription costs for internal use. Institutional library consortial power can also be used to provide leveraged support for journal publishers who commit themselves to a timetable of downsizing on the way to becoming pure quality-control service providers (see SPARC ).

References

1. Odlyzko, A. M. "The economics of electronic journals" in Technology and Scholarly Communication (eds Ekman, R. & Quandt, R.) 380-393 (Univ. Calif. Press, Berkeley, 1998). http://www.press.umich.edu/jep/04-01/odlyzko.html

2. Harnad, S. "Universal FTP archives for esoteric science and scholarship: a subversive proposal" in Scholarly Journals at the Crossroads: A Subversive Proposal for Electronic Publishing (eds Okerson, A. & O'Donnell, J.) 1 (Association of Research Libraries, Washington DC, 1995). http://www.arl.org/scomm/subversive/toc.html

The transition scenario

As soon as all refereed journal articles are self-archived by their authors in their institution's eprint archive, the literature is freed from all access barriers and impact barriers. Self-archiving could be done virtually overnight. The day after, all refereed research becomes freely accessible online to researchers the world over.

One possible outcome is that that will be the end of it. The refereed literature will be free online for those who want it and cannot get it any other way, but those who can afford to get it the old way via paying journals will continue to do so. In this event, the access/impact problem will be solved, but the library's budget crisis will not: it will simply become less important.

An alternative outcome is that when the refereed literature is accessible online for free, users will prefer the free version (as so many physicists already do). Journal revenues will then shrink and institutional savings grow, until journals eventually have to scale down to providing only the essentials (the quality-control service), with the rest (paper version, online PDF version, other 'added values') sold as options.

In none of these outcomes is peer-review itself compromised or put at risk; nor do authors have to give up, even temporarily, submitting to their established journals of choice. All they have to do is self-archive their preprints and postprints in their institutional eprint archives.

Nor are copyright restrictions an obstacle to self-archiving: preprints can be self-archived without any restriction at the time the paper is submitted to a journal. When the final draft is accepted, authors can ask the journal to retain their right to give away that draft online by self-archiving it. In practice, many publishers will agree to this if the author asks, although most do not publicly state it as policy. For these papers, the author can self-archive the refereed postprint alongside the pre-refereeing preprint(s). For those publishers who insist that all rights are transferred, authors can sign the agreement and self-archive a linked 'corrigenda' file, listing for the user what changes have to be made in the preprint to make it equivalent to the postprint. (See copyright details)

31 May 2001 erratum: The self-archiving initiative

In Stevan Harnad's contribution on freeing the scientific literature, the
estimate of the minimum cost of peer review was not from the American
Institute of Physics but from a summary of a group discussion by Mark Doyle
of the American Physical Society. The $500 estimate used in that discussion
included only peer-review costs, not post-acceptance costs. The URL for the
estimate is:
http://documents.cern.ch/archive/electronic/other/agenda/a01193/a01193s5t11/
transparencies/.