The Open Citation Project - Reference Linking for Open Archives
More OpCit Research

OpCit's SMARTer work plan

(SMART= Simple, Measurable, Appropriate, Realistic, Timely)

Linking demonstrators: principal releases and evaluation schedule

Version Archives Added features Release period Evaluation
1.0 arXiv physics archives Backward (in time) linking 2Q1 4Q1
2.0 arXiv physics archives Forward linking, links to online journals services and other digital library services 2Q2 3Q2
2.5 NCSTRL, CoRR, WWW conferences (ACM DL) - distributed archives Inter-link distributed archives 4Q2 1Q3
3.0 All arXiv (inc. maths) and v2.5 Knowledge linking 3Q3 4Q3
Subject to change and revision
Main objectives

Year 1
  • Build pilot reference linked version of arXiv physics archives
  • Create citation database for physics archives
  • Define generic abstractions for inter-archive reference linking
  • Model a generalised "linking service" and investigate how it might interoperate with other digital library services
  • Build demo of inter-linked archives (modelled on selected collections) using linkable objects
  • Investigate patterns of usage of physics archives by authors and readers; determine the impact of citations



Year 2
  • Extend reference linking and interoperability models to other Open Archives
  • Model linkable objects in a repository architecture
  • Build a gateway to Open Archives and eprint archives
  • Integrate other reference linking services in model applications
  • Collaborate on the development of user interfaces for archives, with special emphasis on support for author submission and reference-checking
  • Investigate the impact of linking on usage of the archives and on citation patterns
  • Support maintainers of Open Archives with information, tools and services for reference linking
  • Knowledge-based citation services using ontologies, information contexts; also visualisation and clustering methodologies. Specifically:
    • Analyse citations to find out where they are, what they link to 
    • Cluster papers by authors, keywords, etc., semi-classifying papers in a large repository



Year 3
  • Knowledge-based citation services (continued):
    • Use agent, ontology technologies, etc., to determine what a paper is about
    • Which references are worth reading?
    • Analyse whether papers are cross-disciplinary
    • Are there different ways of viewing
  • Explore concepts for  personalized linkage spaces
Archives and reference linking
(zj, lac, shi)
Analysis (users, citations, etc.)
(lac, tdb, ijh, shi, sha)
JISC/NSF milestones and deliverables
(as per May 1999)
YEAR 1 (Oct 1999 - Sep 2000)        
1Q1.  3 months (Oct - Dec 1999)

Initial linking experiments
- First prototype;
- Links based on explicit arXiv IDs;
- SPIRES experiment;
- First linked pdf document;
- Summary report on early linking work

Identify suitable document format for reference linking demonstrator
PDF chosen

Evaluate TeX/LaTeX -> PDF conversion tools (e.g. pdflatex)
- TeX/LaTeX --(tex/latex)--> DVI --(dvips)--> PS
PS --(acrobat distiller)--> PDF

Convert TeX/LaTeX documents in the arXiv physics archive to PDF files;
Conversion success rate over 91%;

Analyse TeX->PS conversion log files to discover reasons for conversion failures
missing style files; 
errors contained in the original documents;
conversion software developed at performs better, but is not available to us (see below)

3 months (to end Dec 1999)

Build available tools - CiteSeer, SLinkS, various PDF tools, hyperref, translation between/among formats (PreScript), DLS tools if possible, and SFX.

Construct an overview of current projects in reference linking.
December 1 talk to Cornell Digital Library Group

Build collections at Cornell: ACM DL, selected NCSTRL collections, some of LANL, D-Lib is available online.

Set up a project working page

Early research: user - citation analysis of arXiv
Results (in correspondence)

Preliminary analysis of references in recently-used (cached) subset of arXiv papers Reveals probable proportion of linkable references
informs design of link database for pilot demo (see 2Q1)

2Q1.  3 months (Jan - Mar 2000)

Build pilot linked implementation of arXiv physics archive (v1.0):
 - add reference links dynamically to the PDF version of physics archive documents;
 - create simple user interface to access the reference linked archive;
Working demo

Study existing linking systems (e.g. SFX, CiteSeer, LinkBaton) and their possible use in Opcit project
- need to determine user requirements; identify data sources for beyond-arXiv linking; need suitable tool interfaces

6 months (to end June  2000)

1. What is a linkable object?
API for Linkable References

Sample static intra link of collections, using available tools

Example dynamic interlinking of collections (D-Lib, JEP), using available tools
Paper: Link Accessibility in Electronic Journal
Articles (Postscript)

    Release of pilot linking implementation based on chosen subset of archives (6 months)

Report on metadata and architectural interoperability requirements (6 months)
Presentation: An Architecture for
Reference Linking

3Q1. 3 months (Apr - Jun 2000)

Maintain and enhance pilot reference linking demo 
From April 2000, PDF files were not converted from the source, but retrieved daily from Soton mirror site

Provide feedback for proposed API for reference linking (see right)
Main concerns:
- methods should be simpler
- differentiate document and library methods
- practical issues

Install and evaluate CiteSeer (ResearchIndex) software
Some code needs to be amended to run in non-NEC environment; standalone machine recommended; excellent for citation indexing, but does not work well for physics-style references; need to identify suitable resources before rebuilding local implementation

Develop tools to process reference data from the arXiv physics archive documents

Covert copy from collections to suitable formats for processing
XML/XHTML preferred

Begin to define how reference linking tools can interoperate; evaluate tools; define what is needed for flexible, parameterizable, citation agents and linking services.

Build Cornell linking service; figure out how to incorporate it into the Dienst model.

    Report on evaluation of pilot (9 months)
delayed to 4Q1

First year report to NSF (June 30)

4Q1. 3 months (Jul - Sep 2000)

Test links for reliability, correctness in v1.0 demo

Design schema for citation database
- install MySQL to manage database

Build citation database for the physics archives with following features:
forward/backward reference linking; find most cited papers; find papers published in journals, etc.
- extract data and references from all arXiv papers
- parse data
- store in database


3 months (to end Sept. 2000)

Incorporate DLS code (Deciter part) into Cornell reference linking implementation (requires agreement)

Apply latest XML tools: HTML to XHTML conversion; examine XLST spec.

Finish interlinking D-Lib collection; same forJEP. Report applicability of Cornell reference linking software to different collections

Convert references in ACM literature into standard metadata for further processing

Propose set of useful reference linking tools.  Investigate if such a set could be put together into a freely distributable Java package and/or Perl module

Write paper on how reference linking tools can interoperate and work across distributed collections

Monitor changes in spec. for Open Archives metadata; update API as necessary

Add surrogates to Dienst

Write up Year 1 results

Data analysis of usage of arXiv by authors and readers
Ongoing analysis: Mining the social life of an eprint archive
- Usage patterns
- Authors, citations and publication
Short questionnaire based evaluation of pilot demo by immediate partners and collaborators

Short questionnaire based evaluation of pilot demo by small focus group of authors of well-linked papers in physics archives

First year report to JISC (11 months, end Aug.)
including report on evaluation of pilot
YEAR 2 (Oct 2000 - Sep 2001)        
1Q2.  3 months (Oct - Dec 2000)

Review interface to the linked physics arXiv
-determine optimum interface for link presentation
- include revision/update linking?

Distribute evaluation version of link service components to partners
- formulate evaluation agreement
- determine financial requirements
- draft licence for commercial use as necessary
- check conditions on use of Adobe libraries
- aim is to debug code, improve user base and visibility

Complete v2.0 demo (forward/backward links)

Build linked Open Archive of WWW conference series papers

3 months (Oct - Dec 2000)

Finish Java implementation of a reference linking API

Incorporate API into experimental Dienst to support reference linking across Open Archives; add reference information to NCSTRL retrieval results (a la CiteSeer)
- Implement four new views for Dienst "disseminate" verb corresponding to the four API methods
- Test creation of surrogate subdirectories, starting with D-Lib. The presence of a surrogate subdirectory means that the corresponding item can be disseminated according the four views.
- Devise method for surrogates to be re-constructed from data stored in surrogate subdirectories.
- Devise a way for Dienst to call the Java-based API code, OR create a Perl version of the API code.
- Add JEP and DigiNews "repositories", interlinked and retrievable.

Produce reports and papers on results of user/citation analysis    
2Q2.  3 months (Jan - Mar 2001)

v2.0 Extend the reference linking service interface;
 - integration with other reference linking systems (e.g. LinkBaton, OpenURL, CrossRef/DOI?, etc.)

3 months (Jan - Mar 2001)

Determine to what extent reference linking information should reside in persistent storage; do a test implementation
- serialize 
surrogate objects;
- store XML information and reconstruct surrogates;
- wrap surrogates into FEDORA objects and use  FEDORA repository; 
- store information in MySQL databases and reconstruct
Build a reference implementation of selected approach.

Author deposit interface: test integration of Eprints - Cornell API - and reference checking tools   Release of v2.0 linking implementation across arxiv physics archives (18 months)
3Q2.  3 months (Apr - Jun 2001)

Integrate reference linking API with Soton tools; Dienst version for Open Archives; Java version for non-open

v2.5 Explore inter-linking multiple intra-linked (Open) Archives, e.g. NSCTRL - CoRR - WWW conferences (and ACM DL?)

3 months (Apr - Jun 2001)

Reference linking Web demo for NCSTRL
- Extend the NCSTRL top page to include buttons that retrieve linked
text, etc.
- Enhance/replace NCSTRL top page with EPrints user interface.

  Evaluation of linked archive by the broad user community - physicists Report on preliminary evaluation (21 months)
4Q2.  3 months (Jul - Sep 2001)

Explore how reference linked physics archives can be supplemented with knowledge-based links: initially produce links for keywords, authors, glossaries, indexes

3 months (Jul - Sep 2001)

Update report on useful reference linking toolset: what is available, pitfalls, etc., based on current review of tools and the use of them in the API implementation

Determine the impact of links on user/citation analysis: update results of earlier studies from 4Q1
- requires integration of link archives with main arXiv sites; broad visibility, promotion 
  Extension of linking to distributed NCSTRL archives (24 months)

Second year report (23 months)

YEAR 3 (Oct 2001 - Sep 2002)        
1Q3.  3 months (Oct - Dec 2001)

Citation analysis (e.g. related papers; related researchers, ...);

Knowledge-based content analysis, building on the results of the EPSRC-funded COHSE project

3 months (Oct - Dec 2001)

Develop  personalized linkage spaces (in conjunction with Cornell wireless and personal library projects)

Add reference linking services to the National Scientific Digital Library (new NSF project, Fall 2000)

  Evaluation of inter-linked archives centred on NCSTRL - computer scientists  Specification of further enhancements (27 months)
2Q3.  3 months (Jan - Mar 2002)


3Q3.  3 months (Apr - Jun 2002)

v3.0 Integrated demonstrators (v2.0 + v2.5), with knowledge linking services

      Optimised and enhanced implementation (33 months)
4Q3.  3 months (Jul - Sep 2002)     Evaluation of linked archives across all relevant user communities Report of extended evaluation (36 months)

Final report (36 months)

The OpCit Project
This page produced and maintained by the Open Citation project. Contact us