[ Prof. Harnad ] [ Dr. Carr ] [ Dr. Jiao ] [ S. Hitchcock ] [ T. Brody ] [ E-Prints UK Mirror ]
[ Previous ] [ Home ] [ Next ]

Written by: Ian Hickman

Citation Analysis

In arXiv, not only the abstract of a paper, but the complete text is stored, because of this additional information many of the citations in the paper can be extracted. The citations of particular interest are the ones that point to other papers in the archive, about fifty percent of all those picked up were to other papers in the archive (internal citations). If we analyse these internal citations we find that naturally some papers have more citations to them than others.

From arXiv we managed to extract 595698 internal citations from 132218 papers, an average of 4.51 citations per paper. The papers were then split up so that roughly 1/3 of the citations were to high impact papers, 1/3 to medium impact and 1/3 to low impact. Papers with no citations to them are called unknown papers. The number of papers in each set is shown below.

ImpactNo. of PapersNo. of Citations per paper
High269840+
Medium1012213 - 39
Low615181 - 12
Unknown578810

Impact Split

What is the Correlation Between Downloads and Citations

Papers are downloaded for a number of reasons:

  • Citations - papers cite each other.
  • Alerting email - there is a regular email alerting service that tells users of papers that have been newly deposited (uploaded) to the archive.
  • Searching - the comprehensive search option allows users to search by many different categories, for example by author, by archive section and by year, to name just a few.
  • Browsing - Each section of the archive has, among others, "new" and "recent" sections for easy browsing.
  • Serendipity - randomly stumbing across it.
These have effect as shown below.

Authoring and Reading cycle

As downloads influence citations and citations influence downloads it is likely that there would be some correlation between the two.
The following table shows that there is.

Download type r n
All Papers 0.11155 63671
High Impact Papers 0.27293 1981
Medium Impact Papers 0.01288 5937
Low Impact Papers -0.01412 30163

Most papers in the archive cite other papers in the archive, but do users actually follow these citations up? If we look at the age of a paper when it is downloaded, we find that a high proportion of downloads occur within the first week that the paper is in the archive (See figure). This "new paper rush" is beacuse users are directed to new papers by the alerting email that is sent out.

The New Paper Rush

If we look at how this changes over time we find that all impacts experience the "new paper rush" as users have heard nothing about the paper and are unaware, at this early stage, of its quality. We also find that as papers get older the high and medium impact papers are much more likely to get downloaded than papers in the low and unknown impact categories (See Figure). This is probably due to users following up citations, and as the higher impact papers are cited more, users have a greater probability of following citations to them.

Effect of Impact on downloads

[ Previous ] [ Top ] [ Home ] [ Next ]