From: "Leslie Carr" <lac@ecs.soton.ac.uk>
To: "Stevan Harnad" <harnad@ecs.soton.ac.uk>,
"Steve Hitchcock" <sh94r@ecs.soton.ac.uk>
Subject: citation patterns
Date: Mon, 26 Jul 1999 07:56:57 +0100
If you look by destination, then just over half of the explicit XXX (arXiv physics) citations are *to* self-confessed journal articles. The ones that aren't are distributed as follows by year....
9 91
28 92
100 93
108 94
178 95
292 96
417 97
831 98
658 99
i.e. 831 direct XXX (arXiv) citations are to 'not-journals' added to the archive in 1998. Does this distribution reflect the growth in the use of the archive, or the publishing lag?
(Remember that all these numbers are based on an examination of the reference sections of a small subset of the archive data.)
---
Les
From: "Leslie Carr" <lac@ecs.soton.ac.uk>
To: "Steve Hitchcock" <sh94r@ecs.soton.ac.uk>
Subject: XXX usage analysis
Date: Mon, 8 Nov 1999 16:06:22 -0000
I finished the following rudimentary analysis of XXX (arXiv) usage literally hours before Julian 'accidentally' deleted the old XXX (arXiv) archive (cogprints) from which I gleaned the data.
http://www.ecs.soton.ac.uk/~lac/XXXmetadatadeltas.html
The main conclusion of interest is that only about 11% of people seem
to update their articles when the update the metadata. This needs more
investigating, as it is counter to Stevan's assumptions.
---
Les
From: Leslie Carr <lac@ecs.soton.ac.uk>
To: Stevan Harnad <harnad@ecs.soton.ac.uk>
Date: 26 November 1999 00:40
Subject: xxx deltas
Now that Zhuoan has an extra 40Gb on her workstation (60Gb total!) I have unpacked the diffs that julian has been sending me every night. Here's some off-the-top-of my-head comments:
between 5 November and 25 November there were 3634 changes (additions/alterations) made to the archive.
that's 173 per day.
about 1542 of these changes were for pre-november articles (73/day).
(I'm going to ignore the November articles: fresh additions AND aleterations
because I can't tell the difference at the mo).
57 articles had 2 or 3 changes made. (seems to be the case that the first change is the addition of a journal-ref and the next changes are slight changes to the formatting of the journal-ref).
510 articles had both content and meta-data updated
1020 just changed the metadata
7 updated the content without changing the metadata.
The changes of pre Nov99 articles fitted into the following categories
25 articles didn't have a Journal-ref to start with, didn't add
one and didn't change the contents.
451 articles didn't have a Journal-ref to start with, didn't add one
BUT DID change the contents.
785 articles didn't have a Journal-ref to start with, added one and
didn't change the contents.
43 articles didn't have a Journal-ref to start with, added one
AND DID change the contents.
215 articles already had a Journal-ref to start with and didn't change
the contents.
11 articles already had a Journal-ref to start with and did change
the contents.
Of all those who (in this period) added a journal-ref, only 5% (43/828) changed the contents as well.
I think it is necessary to look at exact;ly what happens when an article
is submitted, i.e. don't ignore the November data.
---
Les
Date: Fri, 26 Nov 1999 14:07:39 +0000
From: Leslie Carr <lac@ecs.soton.ac.uk>
To: sh94r@ecs.soton.ac.uk
Subject: how much change
Steve: I have looked at 40 of the articles that were changed when the journal-ref was added.
I have chosen to reprsent the "amount of change" as the ratio of "number of lines of old version changed" / "number of lines of old version". You could argue that material is more likely to be added than deleted, so perhaps it ought to be different. However, here are the results:
25% of the articles have <10% changes.
15% of the articles have >10% and < 20% changes.
30% of the articles have >20% and < 30% changes.
30% of the articles have >30% changes.
The *median* value is 21% changes.
---
Les
From: "Leslie Carr" <lac@ecs.soton.ac.uk>
To: "Steve Hitchcock" <sh94r@ecs.soton.ac.uk>
Subject: Re: Quikchart
Date: Fri, 26 Nov 1999 17:27:37 -0000
Another random sample, another set of statistics.
Looking at the set of hep-th articles from Jan 97 -> Oct 99.
There are about 10600 articles all told.
45% of them (4802 articles) do NOT have a journal ref.
Of those without a journal ref, 38% do have some "publication clue"
in the comments field e.g. the phrases "to appear in" or "submitted to"
or "presented at" or "published in". The clue may indicate something other
than journal publication, e.g. "talk given" or "proceedings" or "lecture".
The balance of comment fields simply give the number of pages and the
TeX macro packages used for formatting.
---
Les
From: "Leslie Carr" <lac@ecs.soton.ac.uk>
To: "Steve Hitchcock" <sh94r@ecs.soton.ac.uk>,
"Stevan Harnad" <harnad@ecs.soton.ac.uk>
Subject: More XXX fascinating facts
Date: Tue, 14 Dec 1999 14:26:23 -0000
From an analysis of (reader) usage 5th January 1999.
There seem to be 1478 "user sessions", where 1 session is a use of the archive from 1 client. A handful of these correspond to proxies and mirrors. The rest seem genuine individuals.
On this day there are requests for (abstracts, sources or ps of) 3773 different articles.
There were...
3718 requests for abstracts
841 requests for the TeX sources
4031 requests for postscript
The distribution of the requested articles by year are as follows:
1 1990
10 1991
45 1992
94 1993
190 1994
185 1995
271 1996
551 1997
3878 1998
1322 1999
You can see the emphasis on the immediate past by looking at the number of downloads from each month for 1998 and Jan 1999 below.
63 9712
73 9801
82 9802
104 9803
96 9804
106 9805
134 9806
133 9807
188 9808
161 9809
221 9810
262 9811
2318 9812
1322 9901
Bear in mind that this is just the downloads for Jan 5th, and the Jan figure is HUGE!
The average number of downloads (all abs/ps/tex) per session is 4.8. Ignoring the largest culprits (all proxies) then that drops to 3.8.
Altogether there were 4031 occasions when a postscript file was requested. 22% of those incidences also downloaded the abstract. 8% of them also downloaded the TeX.
Altogether there were 3718 occasions when an abstract was requested.
24% of those incidences also downloaded the postscript.
3% of those incidences also downloaded the TeX.
It would appear that downloading an abstract only leads to "further
reading" about a quarter of the time.
It would appear that downloading the PostScript is infrequently prompted
by reading the article. In fact,
many of the postscript downloads come immediately after reading the
"current" summary list of articles.
Coming soon: what is the most common means of accessing articles? "Search"
? Reading the "current" list? Reading the list for a particular month?
Can we tell if subsequent downloads in the same session are from articles
cited in the initial download (use (SLAC) SPIRES and examples from hep-th).
---
Les
Follow-up: Mining the social life of an eprint archive