Re: Self-Archiving Impact Advantage: Quality Advantage or Quality Bias? from Henk F. Moed on 2006-12-08 (American-Scientist-Open-Access-Forum)

From: Henk F. Moed <Moed_at_cwts.leidenuniv.nl>
Date: Fri, 8 Dec 2006 09:33:22 -0500

    [ The following text is in the "WINDOWS-1252" character set. ]
    [ Your display is set for the "iso-8859-1" character set. ]
    [ Some characters may be displayed incorrectly. ]

This is submitted to Sigmetrics at the request of Henk Moed, whose email
account is changed so he cannot post to Sigmetrics. (I will reply
shortly.) -- Stevan Harnad

Begin forwarded message:

       Text to be submitted to Sigmetrics

Dear Stevan,

Below follow some replies to your comments on my preprint ^ÑThe
effect of 'Open Access' upon citation impact: An analysis of
ArXiv's Condensed Matter Section^Ò, available at
http://arxiv.org/abs/cs.DL/0611060.

Henk F. Moed
Centre for Science and Technology Studies
Leiden University, The Netherlands
Moed_at_cwts.leidenuniv.nl

1. Early view effect

In my case study on 6 journals in the field of condensed matter
physics, I concluded that the observed differences between the
citation age distributions of deposited and non-deposited ArXiv
papers can to a large extent ^Ö though not fully ^Ö be explained by
the publication delay of about six months of non-deposited articles
compared to papers deposited in ArXiv. This outcome provides
evidence for an early view effect upon citation impact rates, and
consequently upon ArXiv citation impact differentials (CID, my
term) or Arxiv Advantage (AA, your term)..

You wrote: ^ÓThe basic question is this: Once the AA has been
adjusted for the "head-start" component of the EA (by comparing
articles of equal age -the age of Arxived articles being based on
the date of deposit of the preprint rather than the date of
publication of the postprint), how big is that adjusted AA, at each
article age? For that is the AA without any head-start. Kurtz never
thought the EA component was merely a head start, however, for the
AA persists and keeps growing, and is present in cumulative
citation counts for articles at every age since Arxiving began^Ô.

Figure 2 in the interesting paper by Kurtz et al. (IPM, v. 41, p.
1395-1402, 2005) does indeed show an increase in the very short
term average citation impact (my terminology; citations were
counted during the first 5 months after publication date) of papers
as a function of their publication date as from 1996. My
interpretation of this figure is that it clearly shows that the
principal component of the early view effect is the head-start: it
reveals that the share of astronomy papers deposited in ArXiv (and
other preprint servers) increased over time. More and more papers
became available at the date of their submission to a journal,
rather than on their formal publication date. I therefore conclude
that their findings for astronomy are fully consistent with my
outcomes for journals in the field of condensed matter physics.

2. Quality bias

You wrote: ^ÓThe fact that highly-cited articles (Kurtz) and
articles by highly-cited authors (Moed) are more likely to be
Arxived certainly does not settle the question of cause and effect:
It is just as likely that better articles benefit more from
Arxiving (QA) as that better authors/articles tend to
Arxive/be-Arxived more (QB)^Ô

I am fully aware that in this research context one cannot assess
whether authors publish their better papers in the ArXiv merely on
the basis of comparing citation rates of archived and non-archived
papers, and I mention this in my paper. Citation rates may be
influenced both by the ^Ñquality^Ò of the papers and by the access
modality (deposited versus non-deposited). This is why I estimated
author prominence on the basis of the citation impact of their
non-archived articles only. But even then I found evidence that
prominent, influential authors (in the above sense) are
overrepresented in papers deposited in ArXiv.

But I did more that that. I calculated Arxiv Citation Impact
Differentials (CID, my term, or ArXiv Advantage, AA, your term) at
the level of individual authors. Next, I calculated the median CID
over authors publishing in a journal. How then do you explain my
empirical finding that for some authors the citation impact
differential (CID) or ArXiv Advantage is positive, for others it is
negative, while the median CID over authors does not significantly
differ from zero (according to a Sign test) for all journals
studied in detail except Physical Review B, for which it is only 5
per cent? If there is a genuine ^ÑOA advantage^Ò at stake, why then
does it for instance not lead to a significantly positive median
CID over authors? Therefore, my conclusion is that, controlling for
quality bias and early view effect, in the sample of 6 journals
analysed in detail in my study, there is no sign of a general
^Ñopen access advantage^Ò of papers deposited in ArXiv^Òs Condensed
Matter Section.

3. Productive versus less productive authors

My analysis of differences in Citation Impact differentials between
productive and less productive authors may seem ^Óa little
complicated^Ô. My point is that if one selects from a set of papers
deposited in ArXiv a paper authored by a junior (or less
productive) scientist, the probability that this paper is
co-authored by a senior (or more productive) author is higher than
it is for a paper authored by a junior scientists but not deposited
in ArXiv. Next, I found that papers co-authored by both productive
and less productive authors tend to have a higher citation impact
than articles authored solely by less productive authors,
regardless of whether these papers were deposited in ArXiv or not.
These outcomes lead me to the conclusion is that the observed
higher CID for less productive authors compared to that of
productive authors can be interpreted as a quality bias.

4. General comments

In the citation analysis by Kurtz et al. (2005), both the
citation and target universe contain a set of 7 core journals in
astronomy. They explain their finding of no apparent OA effect in
his study of these journals by postulating that ^Óessentially all
astronomers have access to the core journals through existing
channels^Ô. In my study the target set consists of a limited number
of core journals in condensed matter physics, but the citation
universe is as large as the total Web of Science database,
including also a number of more peripherical journals in the field.
Therefore, my result is stronger than that obtained by Kurtz at
al.: even in this much wider citation universe, I do not find
evidence for an OA advantage effect.

I realize that my study is a case study, examining in detail 6
journals in one subfield. I fully agree with your warning that one
should be cautious in generalizing conclusions from case studies,
and that results for other fields may be different. But it is
certainly not an unimportant case. It relates to a subfield in
physics, a discipline that your pioneering and stimulating work
(Harnad and Brody, D-Lib Mag., June 2004) has analysed as well at a
more aggregate level. I hope that more case studies will be carried
out in the near future, applying the methodologies I proposed in my
paper.

________________________________________________________________________________
From: ASIS&T Special Interest Group on Metrics on behalf of Stevan
Harnad
Sent: Mon 11/20/2006 22:49
To: SIGMETRICS_at_LISTSERV.UTK.EDU
Subject: [SIGMETRICS] Self-Archiving Impact Advantage: Quality
Advantage or Quality Bias?

Adminstrative info for SIGMETRICS (for example unsubscribe):
http://web.utk.edu/~gwhitney/sigmetrics.html

    Self-Archiving Impact Advantage: Quality Advantage or Quality
Bias?

                 Stevan Harnad

    SUMMARY: In astrophysics, Kurtz found that articles that were
    self-archived by their authors in Arxiv were downloaded and
cited
    twice as much as those that were not. He traced this enhanced
citation
    impact to two factors: (1) Early Access (EA): The self-archived
    preprint was accessible earlier than the publisher's version
(which
    is accessible to all research-active astrophysicists as soon as
    it is published, thanks to Kurtz's ADS system). (Hajjem,
however,
    found that in other fields, which self-archive only published
    postprints and do have accessibility/affordability problems
with
    the publisher's version, self-archived articles still have
enhanced
    citation impact.) Kurtz's second factor was: (2) Quality Bias
(QB),
    a selective tendency for higher quality articles to be
preferentially
    self-archived by their authors, as inferred from the fact that
the
    proportion of self-archived articles turns out to be higher
among
    the more highly cited articles. (The very same finding is of
course
    equally interpretable as (3) Quality Advantage (QA), a tendency
for
    higher quality articles to benefit more than lower quality
articles
    from being self-archived.) In condensed-matter physics, Moed
has
    confirmed that the impact advantage occurs early (within 1-3
years of
    publication). After article-age is adjusted to reflect the date
of
    deposit rather than the date of publication, the enhanced
impact of
    self-archived articles is again interpretable as QB, with
articles by
    more highly cited authors (based only on their non-archived
articles)
    tending to be self-archived more. (But since the citation
counts
    for authors and for their articles are correlated, one would
expect
    much the same outcome from QA too.) The only way to test QA vs.
QB
    is to compare the impact of self-selected self-archiving with
    mandated self-archiving (and no self-archiving). (The outcome
is
    likely to be that both QA and QB contribute, along with EA, to
the
    impact advantage.)

Michael Kurtz's papers have confirmed that in
astronomy/astrophysics
(astro), articles that have been self-archived -- let's call this
"Arxived" to mark it as the special case of depositing in the
central
Physics Arxiv -- are cited (and downloaded) twice as much as
non-Arxived
articles. Let's call this the "Arxiv Advantage" (AA).
http://arxiv.org/

    Henneken, E. A., Kurtz, M. J., Eichhorn, G., Accomazzi, A.,
Grant,
    C., Thompson, D., and Murray, S. S. (2006) Effect of E-printing
    on Citation Rates in Astronomy and Physics. Journal of
Electronic
    Publishing, Vol. 9, No. 2
    http://arxiv.org/abs/cs/0604061

    Henneken, E. A., Kurtz, M. J., Warner, S., Ginsparg, P.,
Eichhorn, G.,
    Accomazzi, A., Grant, C. S., Thompson, D., Bohlen, E. and
Murray, S.
    S. (2006) E-prints and Journal Articles in Astronomy: a
Productive
    Co-existence (submitted to Learned Publishing)
    http://arxiv.org/abs/cs/0609126

    Kurtz, M. J., Eichhorn, G., Accomazzi, A., Grant, C. S.,
Demleitner,
    M., Murray, S. S. (2005) The Effect of Use and Access on
Citations.
    Information Processing and Management, 41 (6): 1395-1402
    http://cfa-www.harvard.edu/~kurtz/kurtz-effect.pdf

Kurtz analyzed AA and found that it consisted of at least 2
components:

(1) EARLY ACCESS (EA): There is no detectable AA for old articles
in
astro: AA occurs while an article is young (1-3 years). Hence astro
articles that were made accessible as preprints before publication
show
more AA: This is the Early Access effect (EA). But EA alone does
not
explain why AA effects (i.e., enhanced citation counts) persist
cumulatively and even keep growing, rather than simply being a
phase-advancing of otherwise un-enhanced citation counts, in which
case
simply re-calculating an article's age so as to begin at preprint
deposit time instead of publication time should eliminate all AA
effects
-- which it does not.

(2) QUALITY BIAS (QB): (Kurtz called the second component
"Self-Selection Bias" for quality, but I call it self-selection
Quality
Bias, QB): If we compare articles within roughly the same
citation/quality bracket (i.e., articles having the same number of
citations), the proportion of Arxived articles becomes higher in
the
higher citation brackets, especially the top 200 papers. Kurtz
interprets this is as resulting from authors preferentially
Arxiving
their higher-quality preprints (Quality Bias).

Of course the very same outcome is just as readily interpretable as
resulting from Quality Advantage (QA) (rather than Quality Bias
(QB)):
i.e., that the Arxiving benefits better papers more. (Making a
low-quality paper more accessible by Arxiving it does not guarantee
more
citations, whereas making a high-quality paper more accessible is
more
likely to do so, perhaps roughly in proportion to its higher
quality,
allowing it to be used and cited more according to its merit,
unconstrained by its accessibility/affordability.)

There is no way, on the basis of existing data, to decide between
QA and
QB. The only way to measure their relative contributions would be
to
control the self-selection factor: randomly imposing Arxiving on
half of
an equivalent sample of articles of the same age (from preprinting
age
to 2-3 years postpublication, reckoning age from deposit date, to
control also for age/EA effects), and comparing also with
self-selected
Arxiving.

We are trying an approximation to this method, using articles
deposited
in Institutional Repositories of institutions that mandate
self-archiving (and comparing their citation counts with those of
articles from the same journal/issue that have not been
self-archived),
but the sample is still small and possibly unrepresentative, with
many
gaps and other potential liabilities. So a reliable estimate of the
relative size of QA and QB still awaits future research, when
self-archiving mandates will have become more widely adopted.

Henk Moed's data on Arxiving in Condensed Matter physics (cond-mat)
replicates Kurtz's findings in astro (and Davis/Fromerth's, in
math):

    Moed, H. F. (2006, preprint) The effect of 'Open Access' upon
citation
    impact: An analysis of ArXiv's Condensed Matter Section
    http://arxiv.org/abs/cs.DL/0611060

    Davis, P. M. and Fromerth, M. J. (2007) Does the arXiv lead to
    higher citations and reduced publisher downloads for
mathematics
    articles? Scientometics, accepted for publication.
    http://arxiv.org/abs/cs.DL/0603056
    See critiques:

http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/subject.html#5221
    http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/5440.html

Moed too has shown that in cond-mat the AA effect (which he calls
CID
"Citation Impact Differential") occurs early (1-3 years) rather
than
late (4-6 years), and that there is more Arxiving by authors of
higher-quality (based on higher citation counts for their
non-Arxived
articles) than by lower-quality authors. But this too is just as
readily
interpretable as the result of QB or QA (or both): We would of
course
expect a high correlation between an author's individual articles'
citation counts and the author's average citation count, whether
the
author's citation count is based on Arxived or non-Arxived
articles.
These are not independent variables.

(Less easily interpretable -- but compatible with either QA or QB
interpretations -- is Moed's finding of a smaller AA for the "more
productive" authors. Moed's explanations in terms of co-authorships
between more productive and less productive authors, senior and
junior,
seem a little complicated.)

The basic question is this: Once the AA has been adjusted for the
"head-start" component of the EA (by comparing articles of equal
age --
the age of Arxived articles being based on the date of deposit of
the
preprint rather than the date of publication of the postprint), how
big
is that adjusted AA, at each article age? For that is the AA
without any
head-start. Kurtz never thought the EA component was merely a head
start, however, for the AA persists and keeps growing, and is
present in
cumulative citation counts for articles at every age since Arxiving
began. This non-EA AA is either QB or QA or both. (It also has an
element of Competitive Advantage, CA, which would disappear once
everything was self-archived, but let's ignore that for now.)

    Harnad, S. (2005) OA Impact Advantage = EA + (AA) + (QB) + QA +
    (CA) + UA. Preprint.
    http://eprints.ecs.soton.ac.uk/12085/
Received on Fri Dec 15 2006 - 15:28:09 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:38 GMT