Re: Interoperability - subject classification/terminology

From: Hussein Suleman <hussein_at_cs.uct.ac.za>
Date: Thu, 27 Mar 2003 13:00:59 +0200

hi

well, sure, i agree in principle ... if arXiv and similar projects agree
to bunch of all physics into a single category and use google for
searching, with no browsing capabilities, it wouldnt be a problem at all.

similarly, if we grouped together computer science, electrical
engineering and information systems, that would be ok for gross-level
interoperability ... once again, assuming searching is the only service
required. frankfully, i think this is a little simplistic and assumes
digital libraries are no more than submission+search systems.

[aside: why does eprints support browsing by catgeories ?]

besides, who decides what constitutes a discipline anyway ? has anyone
ever been able to decide if computer science is engineering or science ?

i think we have more questions than answers here and it isnt as simple
as you point out or we wouldnt even be discussing this :)

ttfn,
----hussein

Stevan Harnad wrote:
> On Thu, 27 Mar 2003, Hussein Suleman wrote:
>
>>...why not use sets for the separate
>>disciplines, aimed at particular service providers?...
>>some disciplines are not well-defined (namely, computer science)
>>so such archives may want to play ball with multiple service providers
>>and hence may need different sets.
>
> The question of taxonomic classification sets and version-control for
> Open Archives is a technical one, so I will not presume to comment on it
> except from the point of view of the potential *users* of one particular
> kind of Archive Content, namely, unrefereed preprints and refereed
> postprints of research papers from one or many or all disciplines: This
> -- in the google-age of boolean inverted full-text searchability --
> does not require a detailed a-priori taxonomy, as book metadata or the
> metadata for other kinds of material might. A fairly general sorting by
> discipline should suffice.
> http://www.eprints.org/self-faq/#26.Classification
> http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2385.html
>
>>...the service provider can provide an
>>interface for potential data providers to self-register.
>
> I hope that once the number and contents of Open-Access Eprint Archives
> for research preprints and postprints have scaled up toward something
> closer to universality, the simple metadata descriptors "pre-refereeing
> preprint" and "refereed journal article" plus perhaps "discipline name"
> will be enough to guide relevant service-providers in automatically
> harvesting their relevant metadata. Multiple self-registration seems a
> tedious and unnecessary constraint. (Possibly a master-registry of valid
> institutions and disciplinary archives will also help, but may not be
> necessary unless commercial spamming invades this sector too.)
>
>>what remains a difficult problem, however, is how to recreate the
>>metadata used by the service provider as its native format. so, for a
>>typical example, if arXiv classifies items using a specific set
>>structure, this is certainly not going to be the default for an
>>institutional archive. does the service provider automatically or
>>manually reclassify? or does it not allow browsing by categories?
>
> Worrying about "recreating the categories" in this boolean full-text age
> is, I believe, a waste of time (for research preprints/postprints). Just
> harness google's harvested full-text to your engine's search capability,
> if it is incapable of contending with boolean full-text search on its
> own. (Manual reclassification! Heaven forfend! Don't bother classifying
> this material in the first place, beyond the simplest of first-cuts,
> such as discipline. Any further classification should be algorithmic and
> text-data-driven, not manual.)
>
>>in either event, the quality of the metadata from the perspective of the
>>service provider may be an impetus for potential users to want to
>>replicate their effort rather than rely on the automated submission from
>>their own institutions ... this needs more thought ...
>
> Again, I speak only for research preprints/postprints, but please let's
> not inject any further credibility into the notion that self-archiving
> author/institutions will also have to self-advertise by multiple
> self-archiving of the same paper. Surely that is one headache that
> OAI-interoperability should eradicate from the planet! Self-archiving
> itself is self-advertising (and effort) enough. Please let us not
> now -- when the momentum is still not big enough -- saddle would-be
> self-archivers with needless extra worries, and tasks!
> http://www.ecs.soton.ac.uk/~harnad/Temp/tim-arch.htm
>
> Stevan Harnad
Received on Thu Mar 27 2003 - 11:00:59 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:46:56 GMT