Home | You are at |
Citation AnalysisLast updated August 31 2000 16:28:28. This data is based on extracting citations to documents in the arXiv archive from documents deposited in the hep-th area, between 1998 and 2000. The citations were extracted using Spotcites by Dr. Les Carr. The output was then formatted into a tuple table containing the document analysed and its citations of arXiv documents (see Table 1).
Using the inferred date of submission in the arXiv reference (______/yymm___), a month difference can be found between the initial deposit of the article and its citations. This data set can then be summed together to obtain a list of month differences and the number of citations that fall in that time difference (see Table 2). Some anomolies can be seen in Table 2, most obviously that authors are citing documents "in the future"
(negative month difference). The citation tuples (see Table 1) can be used to build a table of the number of citations to documents (the second column), as cited by papers in hep-th. Using Unix shell script: This data can then be cross-referenced with the paper state information that can be obtained from the paper abstract (see Table 4). The types of citation can then be summed together to determine the type of citations that were made:
This graph shows the citation state distribution against the state distribution of papers first deposited in 1998. What do different states of article cite the most?Using the original data set (see Table 1) we can obtain the state of the analysed article and of its citations. We can break the results down by the state of the analysed article, and sum together the state of the cited articles.
How does the age of citations vary with the state of the cited article?Using the data from table 1, and adding the state of the cited article, a similar table to table 2 can be built for different states; with a journal ref, technical reports and "other" (papers that have no recognisable state). This can then be ploted to compare different rates of citation of the relative age of articles. Using the data from table 1 a distribution of where citations are can be built, by summing together the area that the citation reference points to. This only includes citations to other areas in arXiv. Taking papers deposited in 1998 and finding the time difference, in months, between the date each paper was deposited and the date that papers it references were deposited. Comparing this to the number of papers that were deposited in the month of the referenced paper (taking 1998/12 as month 0). The ratio is the number of referenced papers divided by the number of papers deposited in that month. A distribution of where papers cites can be built by taking the percentage of citations for each paper that do not refer to hep-th. See Table 5. Table of papers that have 0% of citations within hep-th:
Age of CitationsIt appears that the age of citations within hep-th is very young (with a peak number of citations being just 1 month difference between the deposit of the cited paper and the deposit of the paper). To take an example, Theta Dependence In The Large N Limit Of
Four-Dimensional Gauge Theories, by Edward Witten (hep-th/9807109).
This has the following citations to it (from hep-th, between 1998-00): hep-th/9802095 hep-th/9807109 -5 hep-th/9807140 hep-th/9807109 0 hep-th/9807156 hep-th/9807109 0 hep-th/9809033 hep-th/9807109 2 hep-th/9809106 hep-th/9807109 2 hep-th/9809173 hep-th/9807109 2 hep-th/9809184 hep-th/9807109 2 hep-th/9810186 hep-th/9807109 3 hep-th/9908148 hep-th/9807109 13 hep-th/9910229 hep-th/9807109 15The month difference is shown in the third column. By searching Web of Science, we can find the citations to this article by all papers. This gives us a total of 22 citations. The publication date of the article is stated as 5th October 1998. We can then read off the dates of citing articles: Jun 2000 - 20 Jun 2000 - 20 Jun 2000 - 20 Apr 2000 - 18 Dec 1999 - 14 Feb 2000 - 16 Jan 2000 - 15 Nov 1999 - 13 Jan 2000 - 15 Dec 1999 - 14 Sep 1999 - 11 Sep 1999 - 11 Aug 1999 - 10 Aug 1999 - 10 Jul 1999 - 9 Jul 1999 - 9 May 1999 - 7 Apr 1999 - 6 Apr 1999 - 6 Mar 1999 - 5 Jan 1999 - 3 This behaviour could be explained because arXiv stores mostly preprints, therefore once the paper goes into print it will be referenced there rather than in the archive (therefore there will be few "old" citations). However the high number of citations that occur soon after the paper being deposited suggests that authors are citing new work soon after it has been released. This can not happen, or is greatly reduced, in the peer-reviewed print world, when publication can take upwards of 6 months/year. Influence of profile shape on the diocotron instability in a non-neutral plasma column, R.C. Davidson, G.M. Felice. Published in October 1998. Date of citations to this article (5 total): May 2000 - 19 Apr 2000 - 18 Mar 2000 - 17 Oct 1999 - 12 Sep 1999 - 11 Open inflation, the four form and the cosmological constant, Turok N, Hawking SW. Published in July 1998. Date of citations to this article (14 total): 2000 - 24 Jan 2000 - 18 Dec 1999 - 17 Nov 1999 - 16 Apr 1999 - 9 Sep 1999 - 14 Aug 1999 - 13 Jul 1999 - 12 Feb 1999 - 7 Apr 1999 - 9 Mar 1999 - 8 Jan 1999 - 6 Jan 1999 - 6 Dec 1998 - 5 Range of Citations within Impact FactorUsing papers by author impact factor. Self citations are based on joining together a unique list of papers from that impact level against the citations from papers of that impact level (so there may be more than one citation to an individual paper).
So low impact authors are less likely to cite other low impact authors, compared to citing high or medium impact authors. Using All Archive PapersTotal number of articles that received a citation: 71297, of 132219 total papers (53.92%). Graph of the frequency of citations (number of articles/citations per article). Analysing the age of citations (cumulative is shown in pink). This graph breaks down the citation ages by papers deposited in '99,'97 and '95. Citations to Papers Deposited in 1998Analysing citations to papers that were deposited in 1998 (i.e. citations from 1998, 1999 and 2000). Therefore this data covers citations with a latency of up to 2 years. grep '/98' d_cited | sort | awk '{ print $1 }' | uniq | wcgrep '/98' ../q1/d_papers | grep abs | wc Total number of articles that received a citation: 13450, of 24057 total papers (55.91%). Frequency of citations (no. citations * no. articles). See table 7. X axis scale has been restricted (max citations is 768). Total citations/Total papers = 89774/24057 = 3.73 (based on Red/Orange links).
Analysing the state of papers referenced in 1998Looking at the citations from papers deposited in 1998. For all papers:
|
Home |