Written by: Ian Hickman

# Is there a Relationship between Citations, Downloads and Age of a Paper?

A is the age of a paper (in days) on 9th May 2000
C is the number of citations to a paper
H is the number of hits on a paper.
The three variables are related as follows:
Correlation rac rah rch
Value 0.168 -0.275 0.100
Where rac is the correlation between age of paper and number of citations.
Where rah is the correlation between age of paper and number of hits.
Where rch is the correlation between number of citations and number of hits.
This table shows three things:
• Older papers get more citations.
• Older papers get less hits.
• Papers with more citations get more hits.

As the three values are all related it is possible to correlate two values and remove the dependance on the third by using a partial correlation technique. If such a technique is used the following values are produced:

Correlation rac-h rah-c rch-a
Value 0.212 -0.298 0.161

Where rac-h is the correlation between age of paper and citations, excluding the effect that the number of hits would have on it.
rah-c is the correlation between age of paper and number of hits, excluding the effects of number of citations of a paper.
rch-a is the correlation between number of citations and number of hits excluding the effects of age.
These 3 values are only slightly bigger in magnitude than the values with no excluded variables.

## Do Papers cite Papers of like Quality?

For the graph below the papers where split based on the total number of citations extracted. There was roughly a third of the citations in each of the high, medium or low sections, although there were far more papers in the medium and low sections than in the high (due to a few papers having many citations each). Unknown papers have no citations to them.

The graph shows that, yes papers do cite papers that are of the same quality. This is shown in that unknown papers mostly cite low quality papers, low quality papers cite anything, and medium and high quality papers cite mostly medium quality papers.

It must be remembered that a paper with a low number of citations to it might be a low quality paper, but it could also be a young paper that isn't old enough yet to be cited by many other papers.

## Do users only follow citations to high impact papers?

A complete set of citations that users followed were extracted from the no duplicate download set by the perl script getusercites (44000 unique citations in all), these were then analysed to determine the impact of the target paper. The proportion of cites followed to high, medium and low impact papers is shown below.

This graph shows that users mostly follow citations to high impact papers. This should not be a coincidence as their is a roughly equal number of citations to the sets of low, medium and high papers.

