Home | You are at

Questions, questions...

Last updated August 31 2000 10:09:20.

Web Traffic

  • Where does the web traffic come from on our mirror? What about other mirrors? What proportion of traffic is from the host nation?
  • Where does the traffic come from on the US site?
  • How do the two compare?
  • Can the session lengths distribution be used to elimenate non-human users? (i.e. do Spiders have long session lengths?)
  • What percentage of papers are hit in their first month on the archive?
  • How often do users have a session on the site?
  • Is the ratio of published to unpublished papers significant (using statistical data?)?

    The ratio or the difference, or even (P-U)/(P+U)

    One suggestion for display is to show the standard error bars for your averages (standard error is the roout mean square deviation from the mean, divided by the sqaure toor of N: could be drived by doing monthly averages and calculating their variance).

  • What is the reading life-cycle of a paper?
  • How does the Impact of the paper effect the life-cycle?
  • Is there a correlation between High Impact papers and High Download rates?

    Divide the papers into hi/med/lo impact using several measures:

    (1) Total hits per paper in the archive
    (2) Citations of the paper (cannot be done in early embryo stages). in the Archive
    We should discuss whether/how we can get this data while Tim and Ian are with us - sh94r
    Hits are in the download stats already. Citations can come from Les/Zhouan's link data and, supplemented, from ISI/SCI (I will contact them when we have made a bit more progress) - Harnad

    (3) Citations of the paper, ISI statistics (tell me what you need and I'll contact ISI about access to their database)
    Which database? We have access to Web of Science - sh94r
    ...try [Web of Science] but papers in XXX may be too early to be picked up by ISI, but the AUTHORS will have their own impact factors, which we can calculate for a sample of lo, med and hi) ... try the BIDS citation index now ... vanishes end of July - Harnad
    See SOTON library for WoS/BIDS.

    (4) Impact factor (citation ratio) of the AUTHOR (rather than the paper): easier to get (both from the Archive, using Zhuoan's tools, and from ISI).

    Further subdivide by hi/med/lo on each of the above measures and sector: HEP/ASTRO/COND/other

    The experimental design is then Impact (3 levels) by sector (4 levels)

    Zhouan: Currently working on the tools to produce the citation information. Also it would be easier to ascertain the impact factor of a paper rather than an author (considering different formats of name, and many authors are quoted for one paper - should the position of the author be considered?

    Defining High Impact

  • Quality of Author(s) - How good is an author?
  • Number of Citations (using SPIRES archive for High Energy Physics)
  • Number of downloads

    Author's hit-rate; author's citation-ratio ("impact factor")

    CitationsAuthorNo. Citations/No. of Papers
    PapersNo. Citations
    HitsAuthor(No. of Papers Factor)*No. Hits/No. of Papers
    PaperNo. Hits

    Hypothesis: That authors who use the archive will have a higher "impact facter" than those who don't (over period 1991 -> 2000).
    Use ArchiveLowHigh
    Not ArchiveMediumMedium


  • Is a paper deposited in a certain field really a paper about that field?

    Important. Here we can use other forms of analysis: Latent Semantic Analysis (I can contact Tom Landauer about the LSA software for research purposes), Shimon Edelman's similarity metric, shared keywords, co-citation

    Produce report on LSA technique.

  • Could we cross reference with the ISI database to ensure this?

    Contact each of the other mirror sites (compose a letter and send it to me: I could edit and send for you).

  • Could we use LSA techniques on the paper abstract to ensure this?

    LSA and other techniques

    How valid is the use of LSA? To make an accurate assessment of the "spread" of an area, a physics dictionary will be needed plus a "core" set of papers that should be in the area. What will this tell us about the archive?


  • What fraction of updates are links to changes as opposed to paper re-writes?

    Area Analysis - does this answer this question? What details are needed for the kinds of updates?
    XXX specifically tells authors to replace their papers, therefore there shouldn't be any "linking" going on? Or are you refering to changing citations? tdb198

    Re-writes of text-body (how big), re-writes of abstract, and front-matter, journal reference insertion

  • Can we cross reference with the SLAC database to confirm publication figures/dates?


    Again, draft a text and I can liaise for you:
    Heath O'Connell hoc@SLAC.Stanford.EDU

    They have all the validated biblio data for all of HEP and many other areas of physics. We MUST use that info to cross-check whether those papers in XXX whose authors have not given journal-refs are indeed in journals. Those stats are essential -- again subdivided by impact-level (hi/med/lo) and sector (HEP/astro/cond/other)
    Get the same info for Astro from the Astro database (pboyce@aas.org)

    Hypothesis: That high-impact authors will deposit papers that get published/are published. Low-impact authors will submit articles that will never be published. Papers that aren't published - why are they submitted to XXX?

    For papers that are not tech-reports/non journal-refed:
    Search SPIRES - for article title [did they not replace original]
    Email author sample - was article published/where?

    ASTRO-PH - does astro store pre-prints, are authors using XXX to store just preprints because they can't store them in Astro? Look in astro/contact authors to find out behaviour.

  • What is the deposition/submission behavior in each field of the archive?

    and at each impact level -- and compare across the years as XXX grew and practise evolved...

  • Are published versions submitted or are papers updated with Journal references? (look to SPIRES)

    and AAS and maybe even ISI

    Contact authors who updated with JR, but not paper, why they didn't/whether they made changes.

  • Is there a correlation between average number of changes and Impact of paper?

    This is one of many variables you will want to correlate with impact (which can be measured the 4 ways mentioned above): latency (how soon the hits occur); whether journal ref is given; sector; etc.

  • Throughout the different fields - what proportion of pre-prints are replaced by peer reviewed reprints?

    For hep-ph (the largest area in the archive), during the 7 month period, only 8 papers were replaced and 217 had their abstracts updated. Is there enough data to answer this question? - tdb198

    i) Is a paper published?
    ii) Does the author say that it is published?
    iii) Did the version number change?
    iv) If it is not updated what is the "diff" between submitted and deposited papers?
    Would need to obtain the published paper from Journal (ISI?) - tdb198

    Meeting 26/7/2000

    Ian Hickman
    Steve Hitchcock
    Tim Brody


    Tim: Think up preamble for questionnaire, estimate what people are going to send back.
    Tim: Break up age of citations by impact factors.
    Ian: What proportion of papers are never hit?
    Tim: Proportion of Orange/Red links over time.
    Tim/Ian: Qualifying and explaining data.

    Meeting 18/7/2000

    Stevan Harnad
    Steve Hitchcock
    Zhouan Jiao
    Tim Brody
    Ian Hickman



  • Build citation data for all areas + dates
  • Degree of change by impact factor
  • Compare impact factors against WoS/BIDS
  • Paper lifecycle, difference between first deposit and publication date [WoS], break down by area/impact factor
  • Write back to Simeon asking for web logs [how is meta data updated by SLAC?]
  • (re)Compile questions, send to SH


  • Write a summary of results for SH
  • Statistical analysis/validating data
  • Is there a relationship between the type [state] of a paper and how much it gets hit

    Meeting 11/7/2000

    Les Carr
    Tim Brody
    Ian Hickman

    (stored in /export/3/users/lac/CITED).
    Citations of hep-th to articles in XXX

    SCOOT - script to apply spotcite to hepth on arabica /export/2/XXX_PDF
    DOLIST - script to take SC.OUT0 and provide list of article x cited article y [ZZZ]


  • Relationship between when papers get hit and cited
  • Time difference between paper being submitted and being cited
  • Do users look at article, then look at cited articles?
  • Avg citations per paper/Citations per author

    Meeting 6/7/2000

    (note these are my notes, so please don't fry me if I get anything wrong!)

    Stevan Harnad
    Steve Hitchcock
    Zhuoan Jiao
    Ian Hickman
    Tim Brody

    (Bits relevent to ePrint usage research:)

    LSA: Could Ian research LSA technique/produce some info on how it works. Harnad: Need to have a "core" set of HEP papers to test against.

    Harnad: 4 tests for impact of articles:
    Citations: Author & Papers
    Hits: Author & Papers

    Steve Hitchcock: Where do we want to go with impact factor analysis?
    Analysing Low-impact vs High-impact authors: Ideally low-impact authors will be able to increase their impact by using XXX [difference in impact of early papers and later papers]

    Tim: SPIRES doesn't contain publication [journal-ref] entries for all papers (of sample of 10, 2 had j/r). Harnad: For papers that do not contain journal-refs, need to contact sample of authors to ascertain what has happened to these papers. Has the article been published in a book/conference etc. Zhouan: This is classed as published.

    Astro-ph section of XXX. Why is it so popular? Low proportion of deposited papers have J-R, how does this relate to ASTRO e-Archive? Contact authors/ASTRO find out what deposits in XXX are? Tim: Large number of technical reports in astro.

    Concerning updates to papers/journal-ref addition. Steve Hitchcok: Authors who update with J-R? Contact authors to find out whether they changed paper/why they didn't (didn't bother because very little change/because it was published and they want people to look in journal?)

    Citation analysis: [from earlier] can Zhouan produce some statistics on citation ratios/can Ian look at Les' code to extract this info? Use ISI to get citation ratios?

    Zhouan: Questions over author extraction; how much sharing of names is there?


  • [ijh198] Investigate LSA analysis
  • [tdb198] Draft a text to SLAC/SPIRES to obtain their database of citations/publication info
  • [tdb198/ijh198] Obtain access to Web of Science/BIDS
  • [Zhouan] Construct stats of citations
  • [Harnad?] Core set of HEP docs for testing against using LSA
  • [Harnad?] Obtain LSA software for use in Soton
  • [tdb198] Research the "unpublished" articles in XXX/break down by area...particularly HEP/COND-MAT/ASTRO

    Next Meeting

    Date of next tech meeting: 2 Weeks From Now
    Date of next general meeting: 22nd August 2000

  • Home