The Open Citation Project

Full text services within the Open Archives initiative - a proposal by the Open Citation project

It was generally agreed at the second meeting of the Open Archives initiative (OAi) that the initiative will not make a significant impact until some working services are delivered. The OAi brings together data providers, those who maintain archives designed for author self-posting of scholarly research papers, and independent service providers.

With its reference linking demonstrator based on an OAi-compliant archive, the Open Citation project claims to be the first OAi service provider, unofficially.

Why unofficial? To comply with the OAi, data providers and service providers have to conform to a set of rules, outlined in the OAi Protocol. Data providers agree to expose their data to compliant service providers by two methods, based on a common protocol and common metadata subset. Service providers, meanwhile, agree to abide by the rules on access imposed by data providers, so that the data provider can maintain some control over access and integrity of its data. Thus the archives on which the reference linking demonstrator is based, the physics archives within, allow service providers to capture metadata about all the papers held within its system.

Simple metadata mandated in the OAi protocol is not sufficient to build a reference linking service, which requires access to the full texts of the papers. In building its demonstrator, and downloading full text papers, the project has worked closely and with full cooperation of arXiv, but in principle does not conform with the rules of the OAi protocol.

The project believes that many of the innovative services that OAi aims to encourage will need access to full texts, yet recognises that there are at least two good reasons why OAi data providers might want to progress cautiously:

It is in this context that the Open Citation project submitted the following position statement to the Open Archives workshop, San Antonio, June 2000.
Topic: Core technical issues 

A Storage Architecture for Full-Text Access to Open Archives

One goal of the Open Citation (OpCit) Project is to integrate and develop software for reference linking in large open archives. To create cross-archive services that add value (e.g. linking, indexing, etc.), OAi data providers need to support a harvesting interface that allows OAi service providers to periodically poll the archives and access the full-text data relevant to their end-user services. The need for this capability has become apparent both from our own experience in developing a reference linking service and from the work of others (e.g. the UPS Prototype Project). 

The current OAi framework does not define such an interface. Eprint archives compliant with the Santa Fe Convention only provide a means for collecting limited metadata which are not rich enough for building services, such as a reference linking service, on top of them. 

To solve this problem we propose extending the current OAi framework in the following ways: 

  1. data providers provide a machine interface for service providers to access the full-text content of the archive data; a copy of the archive data could be stored separately from the end-user interface for this purpose; 
  2. authorised (i.e. Santa Fe-compliant) service providers are allowed to retrieve the full content from this machine interface; 
  3. extend the Open Archives metadata set (OAMS) to include information (e.g. URL) for retrieving the full text of a document (the present Display ID metadata do not serve well for automatic full-text retrieval); 
A related issue: how are services from service providers to be integrated into the data providers' end-user environment? When users visit a data provider site, how do they know that third-party value-added services are available?

You can follow the continuing debate on this topic and others to emerge from the San Antonio meeting in the OAI-general mail archives for June 2000 by following the thread "Post workshop thoughts...".

Postscript. See this short paper progressing two OpCit proposals to OAi, presented at the Experimental OAI-based Digital Library Systems Workshop to be held in conjunction with the 5th European Conference on Research and Advanced Technology for Digital Libraries (ECDL), September 2001.

The OpCit Project
This page produced and maintained by the Open Citation project. Contact us