Re: Central vs. Distributed Archives

From: Stevan Harnad <harnad_at_coglit.ecs.soton.ac.uk>
Date: Tue, 7 Nov 2000 15:15:36 +0000

On Mon, 6 Nov 2000, Greg Kuperberg wrote:

> After all, Stevan, suppose that we told you that CogPrints would be better
> off as part of the arXiv and you should surrender your collection and
> your responsibilities. Would you immediately agree, or would you want
> some time to think about it?

I've already thought about it: CogPrints was originally designed with
subsumption under arXiv (then XXX) in mind. The goal was not to win
fame and fortune as an archivist, but to free the refereed literature,
in all disciplines.

ArXiv had demonstrated the viability of centralized self-archiving in
Physics, and CogPrints was intended to generalize this viability to
other disciplines. Once generality was demonstrated, I could see no
reason why all the disciplinary archives should not just be subsumed by
arXiv: After all (to repeat), the goal was not to promote archives or
archivists, but to free the refereed literature through
self-archiving.

But I had been hedging my bets all along. Apart from advocating
arXiv-style centralized self-archiving, I had also been advocating
distributed self-archiving. In fact, that was the gist of my 1994
"subversive proposal."
http://www.arl.org/sc/subversive/

Now it seems to me that CogPrints, with under 1000 papers after three
years is still lagging behind arXiv, with 130,000 after 11 years. And
even arXiv is still only growing linearly.

So perhaps the centralized approach could use some help, to get the
growth into the exponential range, across disciplines. Enter the
Eprints software, an OAI-compliant adaptation of the CogPrints
software, free for adoption by all universities, so they can
immediately establish interoperable Eprint Archives for all their
researchers, in all disciplines, to self-archive all their refereed
papers in, now.

With interoperability, it is no longer necessary to worry about which
archive the paper is in, or where; nor about whether the archive is a
centralized disciplinary one or a distributed institutional one. It is
no longer a matter of one archive subsuming another: They are all
seamlessly harvested into a global "virtual" archive, on every
researcher's desktop, and "containing" the entire refereed literature
-- just as, say, the ISI's searchable database contains all the titles
and abstracts across all disciplines, except that the full text will be
there too (and free).

So the answer is: Sure I'd have been happy to have CogPrints subsumed
by arXiv if that had proved to be the way to get the entire refereed
corpus online and free. But now it looks as if OAI-compliant
distributed Eprint Archiving (including arXiv) will instead be
"subsumed" into the global virtual Eprint Archive.

For that: immediate agreement, with no need for afterthoughts!

> Some might ask, what is there to decide about how to run an archive?
> For example, the arXiv's policy is that DVI is unreliable as an input
> format, although it does offer it as output. The arXiv requires TeX
> source for new submissions if they are written in TeX. There are other
> subject-based archives out there that accept *only* DVI as a submission
> format. The maintainers of these archives feel that TeX source is an
> unreliable input format, and moreover that TeX source is confidential
> for some authors. It is very difficult to defuse this seemingly minor
> issue, and it is only one of several such issues.

This is a paradigmatic example of Zeno's Paralysis: We sit here fussing
over whether it should all be DVI or TeX source, and most of the
literature is still sitting, waiting, on-paper, and on-disk, unarchived.

The Eprints solution is to accept all formats, as long as at least one
of them is immediately screen-readable: http://www.eprints.org

Get the stuff up there, demonstrate the power of self-archiving to free
the refereed literature today, irreversibly addict everyone to it, and
THEN worry about optimizing formats thereafter.

> For institutional preprint series the issues are a little different,
> but they are equally obstructive. Usually an institutional maintainer
> is less interested in retaining credit, but more concerned, sometimes
> correctly, about following his mandate. If we suggest to university
> U that they contribute their papers to the arXiv, the maintainer at U
> may say "our faculty gave permission for me to list their papers in our
> preprint series, but not to contribute them to your arXiv." That can
> lead to yet another bureaucratic thicket.

Moot all of this by just having all universities self-archive their own
stuff in their own interoperable Eprint Archives. Interoperability and
harvesting will take care of the rest.

> Right behind these superficial issues are more significant ones like
> permanence. The fact is that many institutional and subject-based
> archives do not want the responsibility of permanence. Some of them
> explicitly repudiate it. A standards-based virtual archive approach,
> such as OAI, aspires to please every side and sweep all such issues under
> the rug. I wonder if this is rushing in where angels fear to tread.

I've already replied to this second instance of Zeno's Paralysis
below:

>sh> There is no (not-readily-solvable) "permanence question." At this
>sh> point, getting the literature on-line and free is the most important
>sh> thing to do, now. The collective interests that this will generate in
>sh> KEEPING it all on-line and free will ensure that all proper steps are
>sh> taken to ensure permanence.
>
> Again, experience tells me otherwise. Thousands of math preprints have
> come and gone on the web. Let me also give you a quote from a help page
> of a non-arXiv math archive:
>
> When your paper is ultimately published we would greatly appreciate
> being informed. At that time we will remove the preprint and leave
> a pointer to the journal in which it was published.

Of course papers will vanish if authors are INSTRUCTED to remove them!
But don't blame archiving (centralized or distributed, or the OAI) for
that!

The Eprints software was created specifically to free the refereed
literature, forever, through self-archiving. Authors are instructed
that they can self-archive their pre-refereeing preprints as well as
their refereed postprints therein, permanently.

They are strongly encouraged not to remove archived papers, but instead
to archive more recent versions or corrections on top of them. The
version controller makes sure that the top version is always the one
the user sees first, and all other versions point to it.

At this point (again, in the interests of avoiding Zeno's Paralysis),
authors are not prevented from removing a paper if they wish, for the
simple reason that we feel it would be more of a deterrent to the
freeing of the literature through self-archiving if authors refrained
from self-archiving because they feared that, if they changed their
minds, they could never remove the paper again, than it would be to let
papers be removed if the author insists.

We don't expect many authors to insist, especially for the refereed
postprint. (That's already irremovably in the published literature;
removing it from the free archive merely limits it, again, arbitrarily,
to the paying S/L/P audience.)

In the case of preprints, the "permanence" issue is new (as it was not
previously possible to "publish" preprints on this scale prior to
eprint self-archiving); the practises are still evolving (and should be
allowed to do so); and, frankly, the preprint outcome matters little,
compared to the all-important freeing of the lapidary postprint.

> This flatly contradicts your vision of "freeing the literature". But OAI
> itself does not pass judgement on such policies.

No contradiction: orthogonality. And OAI is right to stay out of this.

Preprint retraction policy should not be dictated; and postprints
cannot be retracted anyway, so why retract the free version from the
archive (if there was any point in freeing it through self-archiving in
the first place).

In short, another red herring.

> > The OAI-compliant archive-creating/maintaining Eprints software has the
> > same notification service as CogPrints -- indeed, it is a generic
> > adaptation of the CogPrints software!
>
> Yes, but it *only* notifies the subscribers of that one little archive.
> The OAI standard leaves OAS agents with no clear notification mechanism,
> because there is no guarantee that the agent will be notified in a
> timely manner by the foundational archives.

An eminently solvable problem, through Open Archive Services. Of course
the notification service only makes sense for the (pertinent subject
sectors) in the whole virtual archive, and not individual institutional
holdings. So shall we refrain from self-archiving (whether centralized
or institutional) until the distributed notification OAS has been
designed, tested, and optimized (on what corpus?), or should we just go
ahead and self-archive and worry about that later?

--------------------------------------------------------------------
Stevan Harnad harnad_at_cogsci.soton.ac.uk
Professor of Cognitive Science harnad_at_princeton.edu
Department of Electronics and phone: +44 23-80 592-582
             Computer Science fax: +44 23-80 592-865
University of Southampton http://www.ecs.soton.ac.uk/~harnad/
Highfield, Southampton http://www.princeton.edu/~harnad/
SO17 1BJ UNITED KINGDOM

NOTE: A complete archive of the ongoing discussion of providing free
access to the refereed journal literature online is available at the
American Scientist September Forum (98 & 99 & 00):

    http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html

You may join the list at the site above.

Discussion can be posted to:

    american-scientist-open-access-forum_at_amsci.org
Received on Mon Jan 24 2000 - 19:17:43 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:45:56 GMT