Re: EPrints, DSpace or ESpace?

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Tue, 13 Apr 2004 02:21:59 +0100

    Prior Topic Thread:
    "EPrints, DSpace or ESpace?"
    http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2670.html

> 2 Open Access News Posting by Garrett Eastman:
>
> http://www.earlham.edu/~peters/fos/2004_04_11_fosblogarchive.html#a108179305530712543
>
> Skeptical eye on Google repository searching
>
> Henk Ellermann, Google Searches Repositories: So What Does Google
> Search For?, http://eepi.ubib.eur.nl/iliit/archives/000479.html -=(In
> Between)=-:, April 12, 2004. Ellermann puts the brakes on enthusiasm
> for Google's proposed federated repository searching, reported in the
> Chronicle of Higher Education on Friday, April 9 (see earlier OAN posting:
> http://www.earlham.edu/~peters/fos/2004_04_04_fosblogarchive.html#a108152131781448637
> .) His questions relate to the actual number of documents concerned;
> press accounts have said the 17 repositories hold an average of 1000
> documents, but Ellermann's calculations show a number considerably
> smaller. He maintains that the repository movement has a long way to go
> to attract and index content and provide reliable access, that there be
> something for Google users to search and find.
>
> Google partners with universities to mine invisible academic literature
> http://www.earlham.edu/~peters/fos/2004_04_04_fosblogarchive.html#a108152131781448637
>
> Jeffrey R. Young, Google Teams Up with 17 Colleges to Test Searches of Scholarly
> Materials, Chronicle of Higher Education Daily Update, April 9, 2004.
> http://chronicle.com/free/2004/04/2004040901n.htm
> MIT and 16 other institutions are collaborating with Google, who, pending
> the success of the test project, will activate a feature that enables
> searching of online repositories such as DSpace. MacKenzie Smith of MIT
> is quoted. "A lot of times the richest scholarly literature is buried"
> in search-engine results, said Ms. Smith. "As more and more content
> is on the Web, it's harder and harder to find the high-quality stuff
> that you need." The universities extensive use of metadata and OCLC's
> involvement in developing a search configuration for the test promise
> a highly useful search tool across multiple collections.

---------------------------

> Google searches repositories: so what does Google search for?
> http://eepi.ubib.eur.nl/iliit/archives/000479.html
>
> Henk Ellerman
>
> The Chronicle of Higher Education reports that Google has ' teamed up' with a
> number of DSpace using universities to develop and add-on to Google's advanced
> search option. The add-on will consist of a search through the contents of
> Institutional repositories.
>
> Although it is not stated in the article, rumor has it that the search will be on
> the full text as well as on the metadata. Within a few months Google therefore
> will offer their users an option to restrict searches to an "intellectual zone".
> That is the official message and it sounds good.
>
> The only problem is that the official message is based on a -how to put it
> nicely?- distorted view on reality. It is stated for instance that the
> participants in this pilot have repositories containing on the average a 1000
> documents. Is that so? let's count.
>
> The following list shows how many documents there are (currently) in the
> repositories of the participating institutions.
>
> MIT 3565 (but not all are available to all)
> Australian National University 34050 (but 0 texts)
> Cornell University 41
> Cranfield University 49
> European University Institute - internal error-
> Hong Kong University of Science and Technology 986
> Indiana University-Purdue University at Indianapolis 27
> Minho University 311
> Ohio State University -cannot be reached-
> Parma University 29
> University of Arizona 1
> University of Calgary 135
> University of Oregon 106
> University of Rochester 138
> University of Toronto 819
> University of Washington 1772 (of which at least 962 pictures and most
> documents not accessible outside UW)
> University of Wisconsin 21
>
> Now.... 1000 documents on the average? Don't think so.
>
> But it is not only the quantity. Even when documents are available it does not
> mean that they are available to everyone. And if it's available to anyone, you
> still can't be sure that the system is running...
>
> What we badly need is a continuous and authoritative review of existing
> Institutional Repositories. The criteria to "judge" the repositories would have to
> include:
>
> * number of documents, (with breakdown per document type)
> * percentage of freely accessible documents
> * up-time
>
> It is great that Google becomes part if the Institutional Repositories effort, but
> we should learn to give fair and honest about what we have to offer. It is is
> actually not that much at the moment. We can only hope that what Google will
> expose is more than just the message "amateurs at work".

-------------------------------------------------------------------

    Dspace is but one part of eprint-space

   Stevan Harnad

One of the reasons why Henk Ellerman finds so few archives and so few
papers in them is that it was rather arbitrary for Google and OCLC to
cover only DSpace archives!

Eprints, for example, has over 120 archives worldwide of exactly the same kind,
with over 40,000 papers in them:

http://archives.eprints.org/eprints.php?action=analysis

Eprints, however, is focusing all its energy on getting more archives
created and filled. At this time there would be no particular interest
even in covering 40,000 more papers when the total number of papers
published every year is at least 2.5 million!

For an idea of the kinds of things Eprints is doing to promote self-archiving of
those 2.5 million papers, see

    (1) the OSI-sponsored handbook http://software.eprints.org/handbook/,

    (2) the self-archiving FAQ http://www.eprints.org/self-faq/ ,

    (3) Tim Brody's "scientometric google," based on citation-links rather than just
        links: http://citebase.eprints.org/ as well as his

    (4) early-days predictor of citation impact from download impact
        http://citebase.eprints.org/analysis/correlation.php and his

    (5) demonstrations of the dramatically enhanced impact of self-archived articles:
        http://www.ecs.soton.ac.uk/~harnad/Temp/OATAnew.pdf See also

    (6) the soon-to-be-launched Declaration of Institutional Commitment to
        Implementing an Open Access Provision Policy:
        http://www.eprints.org/signup/sign.php

These are the kinds of efforts that may soon give those archives contents worth
Google's indexing. That will not come from just increasing the profile of the
pittance they contain to date!

By the way, the real OAI google is OAIster, and it contains over 3 million pearls
from nearly 300 institutions http://oaister.umdl.umich.edu/o/oaister/ but many are
not journal articles (and even if they all were, that still wouldn't be nearly
enough yet!):
http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0023.gif

Stevan Harnad

NOTE: A complete archive of the ongoing discussion of providing open
access to the peer-reviewed research literature online (1998-2004)
is available at the American Scientist Open Access Forum:
        To join the Forum:
http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html
        Post discussion to:
    american-scientist-open-access-forum_at_amsci.org
        Hypermail Archive:
    http://www.cogsci.soton.ac.uk/~harnad/Hypermail/Amsci/index.html

Unified Dual Open-Access-Provision Policy:
    BOAI-2 ("gold"): Publish your article in a suitable open-access
            journal whenever one exists.
            http://www.earlham.edu/~peters/fos/boaifaq.htm#journals
    BOAI-1 ("green"): Otherwise, publish your article in a suitable
            toll-access journal and also self-archive it.
            http://www.eprints.org/self-faq/
    http://www.soros.org/openaccess/read.shtml
    http://www.eprints.org/signup/sign.php
Received on Tue Apr 13 2004 - 02:21:59 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:47:26 GMT