Archives and reference linking
SOUTHAMPTON
(zj, lac, shi) |
Interoperability
CORNELL
(db)
|
Analysis (users, citations, etc.)
SOUTHAMPTON
(lac, tdb, ijh, shi, sha) |
Evaluation
(shi) |
JISC/NSF milestones and deliverables
(as per May 1999) |
YEAR 1 (Oct 1999 - Sep 2000) |
|
|
|
|
1Q1. 3 months (Oct - Dec 1999)
Initial linking experiments
- First
prototype;
- Links
based on explicit arXiv IDs;
- SPIRES
experiment;
- First
linked pdf document;
- Summary
report on early linking work
Identify suitable document format for reference linking
demonstrator
PDF chosen
Evaluate TeX/LaTeX -> PDF conversion tools (e.g. pdflatex)
- TeX/LaTeX --(tex/latex)--> DVI --(dvips)--> PS
PS --(acrobat distiller)--> PDF
Convert TeX/LaTeX documents in the arXiv physics archive
to PDF files;
Conversion
success rate over 91%;
Analyse TeX->PS conversion log files to discover reasons
for conversion failures
missing style files;
errors contained in the original documents;
conversion software developed at arXiv.org performs
better, but is not available to us (see below) |
3 months (to end Dec 1999)
Build available tools - CiteSeer, SLinkS, various PDF
tools, hyperref, translation between/among formats (PreScript), DLS tools
if possible, and SFX.
Construct an overview of current projects in reference
linking.
December
1 talk to Cornell Digital Library Group
Build collections at Cornell: ACM DL, selected NCSTRL
collections, some of LANL, D-Lib is available online.
Set up a project working page |
Early research: user - citation analysis of arXiv
Results
(in correspondence)
Preliminary analysis of references in recently-used (cached)
subset of arXiv papers Reveals
probable proportion of linkable references
informs design of link database for pilot demo (see
2Q1) |
|
|
2Q1. 3 months (Jan - Mar 2000)
Build pilot linked implementation of arXiv physics archive
(v1.0):
- add reference links dynamically to the PDF version
of physics archive documents;
- create simple
user interface to access the reference linked archive;
Working demo
Study existing linking systems (e.g. SFX, CiteSeer, LinkBaton)
and their possible use in Opcit project
- need to determine user requirements; identify data
sources for beyond-arXiv linking; need suitable tool interfaces
|
6 months (to end June 2000)
1. What is a linkable object?
API
for Linkable References
Sample static intra link of collections, using available
tools
Example dynamic interlinking of collections (D-Lib,
JEP),
using available tools
Paper: Link
Accessibility in Electronic Journal
Articles
(Postscript)
|
|
|
Release of pilot linking implementation based on chosen
subset of archives (6 months)
Report on metadata and architectural interoperability
requirements (6 months)
Presentation: An
Architecture for
Reference
Linking
|
3Q1. 3 months (Apr - Jun 2000)
Maintain and enhance pilot reference linking demo
From April 2000, PDF files were not converted from
the source, but retrieved daily from Soton mirror site
Provide feedback for proposed API for reference linking
(see right)
Main concerns:
- methods should be simpler
- differentiate document and library methods
- practical
issues
Install and evaluate CiteSeer (ResearchIndex) software
Some code needs to be amended to run in non-NEC environment;
standalone machine recommended; excellent for citation indexing, but does
not work well for physics-style references; need to identify suitable resources
before rebuilding local implementation
Develop tools to process reference data from the arXiv
physics archive documents
|
Covert copy from collections to suitable formats for
processing
XML/XHTML preferred
Begin to define how reference linking tools can interoperate;
evaluate tools; define what is needed for flexible, parameterizable, citation
agents and linking services.
Build Cornell linking service; figure out how to incorporate
it into the Dienst model.
|
|
|
Report on evaluation of pilot (9 months)
delayed to 4Q1
First year report to NSF (June 30)
|
4Q1. 3 months (Jul - Sep 2000)
Test links for reliability, correctness in v1.0 demo
Design schema for citation database
- install MySQL to manage database
Build citation database for the physics archives with
following features:
forward/backward reference linking; find most cited papers;
find papers published in journals, etc.
- extract data and references from all arXiv papers
- parse data
- store in database
|
3 months (to end Sept. 2000)
Incorporate DLS code (Deciter part) into Cornell reference
linking implementation (requires agreement)
Apply latest XML tools: HTML to XHTML conversion; examine
XLST spec.
Finish interlinking D-Lib collection; same forJEP.
Report applicability of Cornell reference linking software to different
collections
Convert references in ACM literature into standard metadata
for further processing
Propose set of useful reference linking tools. Investigate
if such a set could be put together into a freely distributable Java package
and/or Perl module
Write paper on how reference linking tools can interoperate
and work across distributed collections
Monitor changes in spec. for Open Archives metadata; update API as necessary
Add surrogates to Dienst
Write up Year 1 results
|
Data analysis of usage of arXiv by authors and readers
Ongoing analysis: Mining the social life of an eprint
archive
- Usage
patterns
- Authors,
citations and publication |
Short questionnaire based evaluation of pilot demo by
immediate partners and collaborators
Short questionnaire based evaluation of pilot demo by
small focus group of authors of well-linked papers in physics archives |
First year report to JISC (11 months, end Aug.)
including report on evaluation of pilot |
YEAR 2 (Oct 2000 - Sep 2001) |
|
|
|
|
1Q2. 3 months (Oct - Dec 2000)
Review interface to the linked physics arXiv
-determine optimum interface for link presentation
- include revision/update linking?
Distribute evaluation version of link service components
to partners
- formulate evaluation agreement
- determine financial requirements
- draft licence for commercial use as necessary
- check conditions on use of Adobe libraries
- aim is to debug code, improve user base and visibility
Complete v2.0 demo (forward/backward links)
Build linked Open Archive of WWW conference series papers
|
3 months (Oct - Dec 2000)
Finish Java implementation of a reference linking API
Incorporate API into experimental Dienst to support reference
linking across Open Archives; add reference information to NCSTRL retrieval
results (a la CiteSeer)
- Implement four new views for Dienst "disseminate" verb
corresponding to the four API methods
- Test creation of surrogate subdirectories, starting
with D-Lib. The presence of a surrogate subdirectory means that
the corresponding item can be disseminated according the four views.
- Devise method for surrogates to be re-constructed from
data stored in surrogate subdirectories.
- Devise a way for Dienst to call the Java-based API
code, OR create a Perl version of the API code.
- Add JEP and DigiNews "repositories", interlinked
and retrievable.
|
Produce reports and papers on results of user/citation
analysis |
|
|
2Q2. 3 months (Jan - Mar 2001)
v2.0 Extend the reference linking service interface;
- integration with other reference linking systems
(e.g. LinkBaton, OpenURL, CrossRef/DOI?, etc.)
|
3 months (Jan - Mar 2001)
Determine to what extent reference linking information
should reside in persistent storage; do a test implementation
Choices:
- serialize
surrogate objects;
- store XML information and reconstruct surrogates;
- wrap surrogates into FEDORA objects and use FEDORA
repository;
- store information in MySQL databases and reconstruct
surrogates.
Build a reference implementation of selected approach. |
Author deposit interface: test integration of Eprints
- Cornell API - and reference checking tools |
|
Release of v2.0 linking implementation across arxiv physics
archives (18 months) |
3Q2. 3 months (Apr - Jun 2001)
Integrate reference linking API with Soton tools; Dienst
version for Open Archives; Java version for non-open
v2.5 Explore inter-linking multiple intra-linked (Open)
Archives, e.g. NSCTRL - CoRR - WWW conferences (and ACM DL?)
|
3 months (Apr - Jun 2001)
Reference linking Web demo for NCSTRL
- Extend the NCSTRL top page to include buttons that
retrieve linked
text, etc.
- Enhance/replace NCSTRL top page with EPrints user interface. |
|
Evaluation of linked archive by the broad user community
- physicists |
Report on preliminary evaluation (21 months) |
4Q2. 3 months (Jul - Sep 2001)
Explore how reference linked physics archives can be supplemented
with knowledge-based links: initially produce links for keywords, authors,
glossaries, indexes
|
3 months (Jul - Sep 2001)
Update report on useful reference linking toolset: what
is available, pitfalls, etc., based on current review of tools and the
use of them in the API implementation
|
Determine the impact of links on user/citation analysis:
update results of earlier studies from 4Q1
- requires integration of link archives with main arXiv
sites; broad visibility, promotion |
|
Extension of linking to distributed NCSTRL archives (24
months)
Second year report (23 months) |
YEAR 3 (Oct 2001 - Sep 2002) |
|
|
|
|
1Q3. 3 months (Oct - Dec 2001)
Citation analysis (e.g. related papers; related researchers,
...);
Knowledge-based content analysis, building on the results
of the EPSRC-funded COHSE project |
3 months (Oct - Dec 2001)
Develop personalized linkage spaces (in conjunction
with Cornell wireless and personal library projects)
Add reference linking services to the National Scientific
Digital Library (new NSF project, Fall 2000)
|
|
Evaluation of inter-linked archives centred on NCSTRL
- computer scientists |
Specification of further enhancements (27 months) |
2Q3. 3 months (Jan - Mar 2002)
|
|
|
|
|
3Q3. 3 months (Apr - Jun 2002)
v3.0 Integrated demonstrators (v2.0 + v2.5), with knowledge
linking services
|
|
|
|
Optimised and enhanced implementation (33 months) |
4Q3. 3 months (Jul - Sep 2002) |
|
|
Evaluation of linked archives across all relevant user
communities |
Report of extended evaluation (36 months)
Final report (36 months) |