This site has been permanently archived. This is a static copy provided by the University of Southampton.
Early Opcit: The Les Carr archives

First link demos on arXiv

Date: Thu, 22 Jul 1999 15:42:15 +0100
From: Leslie Carr <>
Subject: Stealing a March on the EPrintLinks Project

I have spent a week fiddling around with XXX (arXiv physics) and have now come up with an initial set of within-archive citation links. You can see the results on the following web page:
The page shows the references in particular articles that have been recognised as corresponding to another article within the archive. The 892 links that you see are (a) not conclusively tested :-) and (b) based on a heath-robinson process which is applied only to the 3% of articles which are "most-recently-requested-and-already-in-the-cache". These links have been derived by "reading" the contents of the references sections; the accuracy of the heath-robinson could be greatly improved.

As well as the above, there are also a huge set of explicit XXX (arXiv) citations in which the citation contains an XXX (arXiv) reference number. These can be seen in

No work has yet been done on feeding the links back into the source, or overlaying them on the ps/pdf view that the user sees.

Note that (SLAC) SPIRES provides a similar service, which is based on the references being typed in.
Leslie Carr

Date: Fri, 23 Jul 1999 17:10:23 +0100
From: Leslie Carr <>
To: Steve Hitchcock <>
Subject: Re: Stealing a March on the EPrintLinks Project

I have reworked some of the scripts (to make them work on more "problematic" articles) and the number of "software deduced" links has expanded from 892 to 6243.This looks a much more worthwhile number :-)

Since there are so many explicit citations, it is reasonable to ask whether my software is just putting in a lot of hard work to find out a link that the user has effectively added by hand anyway. But it turns out that there is actually very little overlap at all. Most references to XXX (arXiv) eprints give just the authors and reference number. Hardly any references to journal articles also give the XXX (arXiv) reference.

Now some do! Perhaps this is part of the eprint->refereed article transition process. Or perhaps it is part of a cultural change in physics. We can check this out later. It would be informative to work out

(Just done some quick calculations. Of the 6243 software deduced links, only 302 appear to correspond to citations where the XXX (arXiv) reference was explicitly given as well.)
Leslie Carr

From: "Leslie Carr" <>
To: "Stevan Harnad" <>,
        "Steve Hitchcock" <>
Cc: <>
Subject: Re: Stealing a March on the EPrintLinks Project
Date: Mon, 26 Jul 1999 07:40:43 +0100

Peculiarly enough, there seems to be little difference between a preprint and a reprint as far as citation practice is concerned.

In particular, if you divide the archive into pre- and re- prints according to whether they themselves claim journal status in their 'meta-data' or not, then there is almost no difference in the tendency to give XXX (arXiv) citations as opposed to Journal references.

I am going to need to check these results out in more detail to see what is actually going on, but it seems counter-intuitive to me.

Date: Tue, 27 Jul 1999 17:22:06 +0100
From: Leslie Carr <>
To: Stevan Harnad <>
Subject: Re: Stealing a March on the EPrintLinks Project

Outstanding issue (A)
> > No work has yet been done on feeding the links back into the source, or overlaying them on the ps/pdf view that the user sees.

Outstanding issue (B)
> > Note that SPIRES provides a similar service, which is based on the references being typed in.

I've just made a demo of these two features in action:
(Interestingly enough, now that I check this document out I see that my reference reading process produced all the links that SPIRES did manually. Hurrah!)

This of course shows the links only in an ASCII document, but we can easily transfer this into a PDF context as and when.
....[1 hour pause]
...OK. See the references section on p11 of
The links appear as black boxes around the page numbers. Not very pretty yet, but functional.

To recap
The PDF file was produced from the XXX (arXiv) archive and then modified by a combination of the citation link database that I have already built and the DLS/PDF linking software. The short html references file was produced from the SPIRES citation data and the XXX (arXiv) 'detexxing' procedure that I have been using.

I think that covers all the angles we discussed. I guess I ought to write up a report and email it to the list.
Leslie Carr
Tel: +44 1703 594479            Fax: +44 1703 592865
Email:   URL:
ACM Member: 5135934             IEEE Member: 40323275
Dept of Electronics and Computer Science, University of Southampton SO17