SUMMARY:
(i) An Institutional Repository (IR) is not the same thing as a Central (uni-disciplinary or multidisciplinary) Repository (CR) like arXiv or PubMed Central.
(ii) A pre-refereeing preprint is not the same as a refereed postprint.
(iii) The first and most fundamental goal of the Open Access movement is to provide Open Access to the published, peer-reviewed research literature.
(iv) Open Access to pre-refereeing preprints is and must remain an optional bonus that the author may or may not provide, temporarily or permanently, over and above access to the refereed postprint.
(v) Open Access to the refereed postprint is a necessity, across all disciplines, to supplement Toll Access (via journal subscription/license/pay-per-view).
(vi) Open Access to the unrefereed preprint is not a necessity, not necessarily discipline-universal, and should not be portrayed as such.
(vii) Central Repositories (CRs) evolved on the basis of spontaneous, voluntary self-archiving, of both preprints and postprints.
(viii) Institutional self-archiving is a matter of systematic institutional policy, and pertains specifically to refereed, published postprints.
(ix) Institutional self-archiving is (largely) restricted to the institution's own authors self-archiving their own work: preprints and postprints.
(x) Institutions can and should control the content of their own IRs (mainly by restricting it to their own researchers' output and by ensuring that it includes all of the institution's published postprint output).
(xi) The fact that institutional employees are the self-archivers gives IRs a level of control and answerability that superordinate CRs like arxiv -- in which anyone in the world can deposit -- do not and cannot have (although research-funder CRs are a partial exception).
(xii) But for neither IRs nor CRs should access-provision (self-archiving), be conflated with publication, nor preprints (provisional) with postprints (peer-reviewed, published, and permanent).
On Fri, 11 Aug 2006, Stevan Harnad [SH] wrote (in the American Scientist Open Access Forum):
SH: "UNREFEREED PREPRINTS: If you want authors to be willing to deposit their unrefereed preprints at all, you must allow them to remove them at will, instantaneously.
"(It is a good and useful author practice to self-archive preprints: it establishes priority, it elicits corrective peer feedback, it creates a historic record of stages of development of a work, it accelerates and increases research impact and progress. But if the institution imposes a foolishly oppressive removal policy, authors will simply be discouraged from taking the useful step of depositing their unrefereed preprints in the first place)."
Simeon Warner [SW], arXiv, replied (in JISC-REPOSITORIES):
SW: "I disagree with Stevan here, and this is not the policy we follow at arXiv. If you expect a preprint to allow authors to establish priority then you are saying that the preprint has become part of the scholarly record and it should thus not be removed. In arXiv we allow authors to post a withdrawal notice but old versions remain publicly available (in a very small number of cases of copyright infringement and personal insult we have removed articles)."
First, the (many) points of agreement with Simeon:
(1) Yes, all things being equal, it is greatly preferable not to remove deposited documents, whether
preprint or postprint, hence removal should not be encouraged.
(2) Yes, depositing a pre-refereeing preprint is a good way to establish priority, even before formal publication.
(3) Yes, depositing pre-refereeing preprints is in any case a good practice, beneficial to research progress, especially in fast-moving, early-uptake fields, and is to be encouraged.
(4) Yes, a scholarly record of pre-publication stages of research reports is of interest and value in and of itself.
But now the disagreements:
(i) An
Institutional Repository (IR) is not the same thing as a
Central (uni-disciplinary or multidisciplinary) Repository (CR) like
arXiv or
PubMed Central.
(ii) A pre-refereeing preprint is
not the same as a refereed postprint.
(iii) The first and most fundamental
goal of the Open Access movement is to provide Open Access to the published, peer-reviewed research literature.
(iv) Open Access to pre-refereeing preprints is and must remain an optional
bonus that the author may or may not provide, temporarily or permanently, over and above access to the refereed postprint.
(v) Open Access to the peer-reviewed postprint is a necessity, across all disciplines, to supplement Toll Access (via journal subscription/license/pay-per-view).
(vi) Open Access to the unrefereed preprint is not a necessity, not necessarily discipline-universal, and should not be portrayed as such.
(vii) Central Repositories (CRs) evolved on the basis of spontaneous, voluntary self-archiving, of both preprints and postprints.
(viii) Institutional self-archiving is a matter of systematic
institutional policy, and pertains specifically to refereed, published postprints.
(ix) Institutional self-archiving is (largely) restricted to the institution's own authors self-archiving their own work: preprints and postprints.
(x) Institutions can and should control the content of their own IRs (mainly by restricting it to their own researchers' output and by ensuring that it includes all of the institution's published postprint output).
(xi) The fact that institutional employees are the self-archivers gives IRs a level of control and answerability that superordinate CRs like arxiv -- in which anyone in the world can deposit -- do not and cannot have (although
research-funder CRs are a partial exception).
(xii) But for neither IRs or CRs should access-provision (self-archiving), be
conflated with publication, nor, preprints (provisional) with postprints (peer-reviewed, published, and permanent).
SW: "For a thought experiment to help with this, imagine [depositing] multiple solutions to some problem to an archive and then removing all but the correct one at some later date. Is that a reasonable way to establish priority?"
No. The reasonable way to establish priority is to deposit the unrefereed preprint in your IR (or CR) to establish priority and then to get it
refereed and published in a refereed journal. If the author of the published version is no longer interested in asserting or preserving pre-publication priority (for some unfathomable reason), he can remove the unrefereed preprint (although downloaded, cached and harvested residues may still perdure). The canonical version is, always was, and will continue for the foreseeable future to be the published, peer-reviewed, "certified" version: the postprint.
Corrections are another matter. In principle, any version could turn out to contain an error, detected later: the unrefereed preprint or the refereed postprint. The difference is that the unrefereed preprint can (in principle) be deleted (not necessarily in practice, as ghostly remnants, downloaded or cached elsewhere, can return to haunt the author). The published version can only be formally "retracted," but it cannot be "unpublished." It cannot be withdrawn from the bookshelves and the hard-disks of the world, nor from the annals of the journal in which it was published. Corrected post-publication updates, however, can be disseminated too.
So please don't conflate
preprint self-archiving, which is a (possibly temporary and ephemeral) way of providing early (risky) access to unrefereed research, with postprint self-archiving, which is a way of supplementing access to
refereed, published research.
SW: "I think the option to allow authors to remove e-prints is simply an unpleasant compromise that may be necessary to help populate repositories."
Again, unrefereed, unpublished preprints and refereed, published postprints are being conflated here, as is preservation-archiving and access-archiving:
Self-archived drafts can be disinterred from the archive: "un-archived." I agree that this should be discouraged, wherever it is unnecessary, but I don't find it at all unpleasant to allow authors the permanent option of withdrawing unpublished work from public view if they so wish (and not merely as a sop for enticing reluctant self-archivers to go ahead and self-archive!). That's the difference between publishing something and merely providing access to it. Publishing is archival, permanent and irreversible. Access-provision is not.
SW: "One could hope that the option might later be removed in a bait-and-switch move. This was how it played out in arXiv though it was not thought of in that way. Versions have been stored since 1997 but before that a revision overwrote the previous version."
With all due respect, I think arxiv was an important milestone in the evolution of self-archiving, Open Access, and Institutional Repositories, but it is neither the optimal model nor (I believe) the wave of the future for research self-archiving. The wave of the future (thanks to OAI-interoperability) is (I believe) distributed local-institutional self-archiving of each institution's own research output in its own IR, not central, arxiv-style self-archiving. Central
harvesting --
Oaister-,
citebase-,
citeseer-,
scirus- and
google-scholar-style -- will take care of the rest, harvesting the distributed OA IR contents seamlessly into searchable central "
virtual archives ("VRs")."
Research institutions (universities, mostly) have an interest in two things: (1) maximising the
usage and impact of their research output, by maximising access to it, and (2) preserving a permanent
record of their research output.
Self-archiving institutional research output (preprint and postprint) serves the purposes of both (1) and (2), but only (1) requires that the output be made Open Access; (2) would be equally well-served by Closed Access self-archiving. And the only thing an institution can insist upon being deposited in its permanent archive is the author's final, refereed, published drafts; authors are well within their rights and reason to reserve the prerogative to decide for themselves what pre-refereeing drafts they wish to grant access to, and which of them, if any, they wish to retain in the permanent record.
This being the online, networked age, however, the following unprecedented sequence can happen (and no doubt has and will): An unrefereed preprint is posted publicly only, read, used and cited, and then withdrawn without being published, orphaning links and citations (unless the users/citers preserved a draft). This is not good for scholarly progress, and a solution will evolve. The most likely solution is that institutions will make their authors answerable for what they post publicly in their IR at least insofar as concerns requiring them to leave at least a Closed Access version of it in the archive permanently -- with a URL or DOI that permanently identifies it, but does not necessarily provide public access to the full text itself. Under special circumstances, referees,
official auditors, etc., should be able to apply to the institutions for access to the full text, in cases of scholarly dispute about what it had contained.
Why leave the option to allow the publicly posted preprint to revert to Closed Access status? Because if it did contain an error, leaving it publicly accessible -- even if there are links and pointers to corrected versions and updates -- leaves open the possibility that an unwitting user will access the erroneous version. The probability is low; and even withdrawal does not reduce that probability to zero (because of likely downloaded and harvested residues persisting here and there); but the sensible, scholarly policy for an IR is to support the withdrawal of unrefereed, unpublished work, while formally discouraging its withdrawal.
That is the long and short of it. It has nothing whatsoever to do with "unpublishing" published work. And, yes, the difference between peer-reviewed publications and unrefereed self-postings is a profound and important one, even in the OA age. The official scholarly record is the
published record, not the "posted" record.
Stevan Harnad
American Scientist Open Access Forum