Home | You are at |
Analysis of Article AuthorsLast updated September 07 2000 15:10:18. (Tables are only available to Soton viewers) Authors (about 76000 authors) Authors 2 (Table 1), broken down by area and only having no firstname(slightly less 76000 authors) Example of same name, different spelling: 4. nucl-th/9810016 [abs, src, ps, other] :Authors: K. Tsushima (1), D. H. Lu (1), A. W. Thomas (1), K. Saito (2), R. H. Landau (3) ((1) CSSM and The University of Adelaide, Australia (2) Tohoku College of Pharmacy, Sendai, Japan (3) Oregon State University, USA) Comments: RevTex, 14 pages, 3 Postscript figures, version to appear in Phys. Rev. C, title, abstract, text, references are modified Journal-ref: Phys.Rev. C59 (1999) 2824-2828 5. hep-lat/9810005 [abs, src, ps, other] :Authors: Derek B. Leinweber, Ding H. Lu, Anthony W. Thomas Comments: Revised version accepted for publication includes a new section demonstrating extrapolations of lattice QCD results Journal-ref: Phys.Rev. D60 (1999) 034014 Example distribution of name format:
[authors.txt] These names would be collapsed to S.W.Hawking, Stephen_W.Hawking and Stephen_Hawking. [authors2.txt] These names would be collapsed to S.W.Hawking and S.Hawking. This data does not include papers submitted before 10/94, as at that time there are no author meta-tags. Author CitationsUsing the citation data provided by Dr. Les Carr, hep-th, 98-00 we can build a table of the number of citations that individual authors have got (disregarding the importance or not of the author). See Table 2. Then, using Table 1, a mean number of citations per paper can be built for the author, Table 3 (Author, Citations, Papers, Citations/Papers). Graph of the number of citations an author has received, against the number of papers that author has written. A trend (Excel: poly 2) is shown in black. Excluding Self-CitationUsing the same technique as above the citation "impact" can be found for authors, except excluding any occasions where an author references themselves. Source Paper - Cited Paper This results in Table 4. (Code to generate mean citations/author awk '{print $1"\t"$2"\t"$3"\t"($2/$3)}' < d_notauthorcitations2 | sort -rn +3 > d_notauthorcitations3) Defining Impact(Tim's patent-pending bear-no-relationship-to-statistics-method) Using Table 4, where $2 is the sum citations (y axis) and $3 is the sum papers (x axis).
Although some highly-cited authors may be excluded from "High Impact", because I require a minimum number of papers. It is assumed that an author's lack of articles shows that they either do not use the archive or have not written many papers, in which case their impact may be a "one off". Splitting By ThirdsUsing the citations/papers ratio as sort algorithm, then splitting the authors into three equal groups.
Splitting Using QuartilesUsing the citations/papers ratio as cumulator for quartiles. Taking top/bottom 25% and middle 50%. Adding in the number of deposits that the articles that these authors have deposited have, and taking the mean over the number of authors. Dividing this by the mean number of papers per author generates a deposit "rate" for the sector - the mean number of deposits per paper per author.
Authors Per PaperTotal number of papers are the number of abstracts, that are after
1995 - we can't get authors before that time: Total number of authors:
Authors per Paper, by ImpactUsing the impact level author list, a list of papers by those authors can be compiled. Using that list of papers a list of authors who are named for those papers can be built.
Paper state by Author ImpactThe state of papers, by author impact level. For Whole ArchiveUsing spotcites data for all papers. What proportion of citations does this cover?
This gives 100*(603460/2957912) = 20.40% (i.e. 1 in 5 citations), as a proportion of all citations in the archive. (603460/132219) = 4.56 citations/paper identified, against 2957912/132219 = 22.37 citations/paper identified from PDF source. Analysing how many red/orange links have been picked up, by year. Using the raw citation data (paper -> citation), the number of references for a given year can be found by taking the first two digits from the paper reference. The total number of papers deposited in that year can then be found by using a listing of all papers in the archive and using the first two digits of the paper references. When taking the total number of papers, any papers from areas that did not have any references were ignored. Total citations = 597688. Total papers = 115940 (only includes 2000 up to June).
Using quartiles we come up with the following split for authors:
Mean number of citations/author (ignoring the number of papers those authors have deposited).
Mean number of hits/paper (by author impact).
This graph shows the proportion of authors with a given deposit rate for different impact levels. The number of authors for each deposit rate is shown. Papers that have authors from different impact levels/% of all unique papers in combined area:
Papers with authors from all three impact levels: 155/ (155/93435) 0.166% This diagram shows the approximate authorship of papers (the area of all the circles are all the papers, and each circle represents the authors of those papers). Therefore where the circles intercect is where papers have authors from more than one impact level. This graph shows the cumulative number of papers against the number of citations for those papers (divided into high, med, low impact authors). Authors per paper (awk '{ print $2 }' d_highimpactauthorpapers | sort | uniq -c | awk '{ c++; s += $1 } END { print s/c }'):
These graphs show the frequency of papers broken down by the number of citations they receive and by what impact the authors were (so these graphs may feature the same paper more than once, as a individual paper may have more than one author). This graph shows the age of citations (the time difference between a paper being deposited and its referenced papers being deposited), broken down by the impact factor of the paper's authors. How long have authors been depositing?Using the authored list (paper ref * author name), the time difference in months can be found between the first paper the author deposited and the last. This includes authors who have only one paper in the archive (defined as have a period of 0 months).
Author names can not be easily extract pre-1994, so there is a peak at 5 years of usage from all the authors who have continually deposited from before that period, but only appear in 1995. Looking at the time between every paper deposited by an author:
This graph is based on taking the time difference, in order, between papers deposited by authors (the yymm part of the paper reference), excluding the time difference between two papers deposited in the same month (i.e. 0). Growth of Authors Over TimeBy using the meta data "author" field, the number of unique authors of papers per year can be found (for most areas the number of authors can not be easily found at or before 1994).
Analysing number of authors by LANL subfieldUsing the authors meta-data field the author list can be found for each paper. The total number of authors and total number of papers can then be found by summing each occurence of a unique author for each area and each occurence of a unique paper for each area. To find the variance the number of authors per paper was also stored. Because the authors meta-data field did not exist in some areas before 1995 all these years have been ignored.
We can also analyse the distribution of authors between archive sub-fields by finding the intersection and union between sets of authors from different fields. The values shown is the cardinality of intersection divided by cardinality of union. |
Home |