Re: Cliff Lynch on Institutional Archives

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Sun, 16 Mar 2003 15:09:24 +0000

On Sun, 16 Mar 2003, Lee Miller wrote:

> The simplest way to aggregate papers within disciplines would be include a
> discipline field in the metadata.

I agree. And this confirms that "aggregation" is merely (1) a
metadata-based from of re-packaging and (2) need not re-package the
full-text but merely the pointers to it. Hence it is not the case that the
(full-text) *data* from distributed Institutional OAI Archives need to be
"fed" (harvested) into central Disciplinary OAI Archives. "Aggregation"
is merely a special case (or rather a special name) for ordinary OAI
Service-Provision -- which is precisely what the OAI Metadata Harvesting
Protocol was designed for! The old paper-based idea of journal-content
"aggregators" is simply misleading us here. Online "aggregators" are
really just search engines, pulling out and ranging over
discipline-specific subsets of OAI full-text content space.

> This gets back to the problems of subject
> classification, but at the discipline level a short list of defined
> discipline descriptors should be sufficient.

A *very* short list. Because once I have narrowed it to "Ecology," the
rest is best done with boolean full-text search and algorithms rather than
prefabricated human classification schemes.

> For example, the discipline of ecology includes plants, animals,
> microorganisms, terrestrial and aquatic ecosystems, physical environments,
> physiology, applied mathematics, and many other sub-fields. Nevertheless,
> ecologists of all stripes recognize and enjoy common bonds in the general
> discipline. A small number of general journals that publish papers from
> many of the sub-disciplines are followed by many researchers and academics,
> regardless of their specialty fields. Thus inclusion of the discipline
> desciptor "ecology" would allow aggregation of papers at a level that has
> already proved useful to ecologists for over a century.

No problem. But how many such high-level (useful) partitions do you think
there really are, within, say, "Biology"? I suspect we are talking about
a very small number; the rest is boolean content-based search. (Besides,
it is not just *journals* we are classifying, as in the old aggregator
days, but *papers*.)

> A similar level of aggregation in other fields would surely be useful as a
> tool for harvesting papers of particular interest from institutional archives.

I suspect that these high-level, a-priori categories will be similarly
sparse in all disciplines: There is no pre-classification needed much
beyond the level of the discipline-name itself. We are not sorting
journals any more; we are searching open-access contents. There will be
powerful content-based algorithms for narrowing it to the kinds of
material we want, but little of it will resemble how we used to
aggregate journals.

Stevan Harnad
Received on Sun Mar 16 2003 - 15:09:24 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:46:55 GMT