Sunday, January 21. 2007The Open Access Citation Advantage: Quality Advantage Or Quality Bias?
This is a preview of some preliminary data (not yet refereed), collected by my doctoral student at UQaM, Chawki Hajjem. This study was done in part by way of response to Henk Moed's replies to my comments on Moed's (self-archived) preprint: Moed, H. F. (2006) The effect of 'Open Access' upon citation impact: An analysis of ArXiv's Condensed Matter SectionMoed's study is about the "Open Access Advantage" (OAA) -- the higher citation counts of self-archived articles -- observable across disciplines as well as across years as in the following graphs from Hajjem et al. 2005 (red bars are the OAA): The focus of the present discussion is the factors underlying the OAA. There are at least five potential contributing factors, but only three of them are under consideration here: (1) Early Advantage (EA), (2) Quality Advantage (QA) and (3) Quality Bias (QB -- also called "Self-Selection Bias").FIGURE 1. Open Access Citation Advantage By Discipline and By Year. Preprints that are self-archived before publication have an Early Advantage (EA): they get read, used and cited earlier. This is uncontested. Kurtz, Michael and Brody, Tim (2006) The impact loss to authors and research. In, Jacobs, Neil (ed.) Open Access: Key strategic, technical and economic aspects. Oxford, UK, Chandos Publishing.In addition, the proportion of articles self-archived at or after publication is higher in the higher "citation brackets": the more highly cited articles are also more likely to be the self-archived articles. The question, then, is about causality: Are self-archived articles more likely to be cited because they are self-archived (QA)? Or are articles more likely to be self-archived because they are more likely to be cited (QB)?FIGURE 2. Correlation between Citedness and Ratio of Open Access (OA) to Non-Open Access (NOA) Ratios. The most likely answer is that both factors, QA and QB, contribute to the OAA: the higher quality papers gain more from being made more accessible (QA: indeed the top 10% of articles tend to get 90% of the citations). But the higher quality papers are also more likely to be self-archived (QB). As we will see, however, the evidence to date, because it has been based exclusively on self-selected (voluntary) self-archiving, is equally compatible with (i) an exclusive QA interpretation, (ii) an exclusive QB interpretation or (iii) the joint explanation that is probably the correct one. The only way to estimate the independent contributions of QA and QB is to compare the OAA for self-selected (voluntary) self-archiving with the OAA for imposed (obligatory) self-archiving. We report some preliminary results for this comparison here, based on the (still small sample of) Institutional Repositories that already have self-archiving mandates (chiefly CERN, U. Southampton, QUT, U. Minho, and U. Tasmania). FIGURE 3. Self-Selected Self-Archiving vs. Mandated Self-Archiving: Within-Journal Citation Ratios (for 2004, all fields). Summary: These preliminary results suggest that both QA and QB contribute to OAA, and that the contribution of QA is greater than that of QB. Discussion: On Fri, 8 Dec 2006, Henk Moed [HM] wrote: HM: "Below follow some replies to your comments on my preprint 'The effect of 'Open Access' upon citation impact: An analysis of ArXiv's Condensed Matter Section'...The findings are definitely consistent for Astronomy and for Condensed Matter Physics. In both cases, most of the observed OAA came from the self-archiving of preprints before publication (EA). Moreover, in Astronomy there is already 100% "OA" to all articles after publication, and this has been the case for years now (for the reasons Michael Kurtz and Peter Boyce have pointed out: all research-active astronomers have licensed access as well as free ADS access to all of the closed circle of core Astronomy journals: otherwise they simply cannot be research-active). This means that there is only room for EA in Astronomy's OAA. And that means that in Astronomy all the questions about QA vs QB (self-selection bias) apply only to the self-archiving of prepublication preprints, not to postpublication postprints, which are all effectively "OA." To a lesser extent, something similar is true in Condensed-Matter Physics (CondMP): In general, research-active physicists have better access to their required journals via online licensing than other fields do (though one does wonder about the "non-research-active" physicists, and what they could/would do if they too had OA!). And CondMP too is a preprint self-archiving field, with most of the OAA differential again concentrated on the prepublication preprints (EA). Moreover, Moed's test for whether or not a paper was self-archived was based entirely on its presence/absence in ArXiv (as opposed to elsewhere on the Web, e.g., on the author's website or in the author's Institutional Repository). Hence Astronomy and CondMP are fields that are "biassed" toward EA effects. It is not surprising, therefore, that the lion's share of the OAA turns out to be EA in these fields. It also means that the remaining variance available for testing QA vs. QB in these fields is much narrower than in fields that do not self-archive preprints only, or mostly. Hence there is no disagreement (or surprise) about the fact that most of the OAA in Astronomy and CondMP is due to EA. (Less so in the slower-moving field of maths; see: "Early Citation Advantage?.") I agree with all this: The probable quality of the article was estimated from the probable quality of the author, based on citations for non-OA articles. Now, although this correlation, too, goes both ways (are authors' non-OA articles more cited because their authors self-archive more or do they self-archive more because they are more cited?), I do agree that the correlation between self-archiving-counts and citation-counts for non-self-archived articles by the same author is more likely to be a QB effect. The question then, of course, is: What proportion of the OAA does this component account for?SH: "The fact that highly-cited articles (Kurtz) and articles by highly-cited authors (Moed) are more likely to be Arxived certainly does not settle the question of cause and effect: It is just as likely that better articles benefit more from Arxiving (QA) as that better authors/articles tend to Arxive/be-Arxived more (QB)."HM: "2. Quality bias. I am fully aware that in this research context one cannot assess whether authors publish [sic] their better papers in the ArXiv merely on the basis of comparing citation rates of archived and non-archived papers, and I mention this in my paper. Citation rates may be influenced both by the 'quality' of the papers and by the access modality (deposited versus non-deposited). This is why I estimated author prominence on the basis of the citation impact of their non-archived articles only. But even then I found evidence that prominent, influential authors (in the above sense) are overrepresented in papers deposited in ArXiv." HM: "But I did more that that. I calculated Arxiv Citation Impact Differentials (CID, my term, or ArXiv Advantage, AA, your term) at the level of individual authors. Next, I calculated the median CID over authors publishing in a journal. How then do you explain my empirical finding that for some authors the citation impact differential (CID) or ArXiv Advantage is positive, for others it is negative, while the median CID over authors does not significantly differ from zero (according to a Sign test) for all journals studied in detail except Physical Review B, for which it is only 5 per cent? If there is a genuine 'OA advantage' at stake, why then does it for instance not lead to a significantly positive median CID over authors? Therefore, my conclusion is that, controlling for quality bias and early view effect, in the sample of 6 journals analysed in detail in my study, there is no sign of a general 'open access advantage' of papers deposited in ArXiv's Condensed Matter Section."My interpretation is that EA is the largest contributor to the OAA in this preprint-intensive field (i.e., most of the OAA comes from the prepublication component) and that there is considerable variability in the size of the (small) residual (non-EA) OAA. For a small sample, at the individual journal level, there is not enough variance left for a significant OAA, once one removes the QB component too. Perhaps this is all that Henk Moed wished to imply. But the bigger question for OA concerns all fields, not just those few that are preprint-intensive and that are relatively well-heeled for access to the published version. Indeed, the fundamental OA and OAA questions concern the postprint (not the preprint) and the many disciplines that do have access problems, not the happy few that do not! The way to test the presence and size of both QB and QA in these non-EA fields is to impose the OA, preferably randomly, on half the sample, and then compare the size of the OAA for imposed ("mandated") self-archiving (Sm) with the size of the OAA for self-selected ("nonmandated") self-archiving (Sn), in particular by comparing their respective ratios to non-self-archived articles in the same journal and year: Sm/N vs. Sn/N). If Sn/N > Sm/N then QB > QA, and vice versa. If Sn/N = 1, then QB is 0. And if Sm/N = 1 then QA is 0. It is a first approximation to this comparison that has just been done (FIGURE 3) by my doctoral student, Chawki Hajjem, across fields, for self-archived articles in five Institutional Repositories (IRs) that have OA self-archiving mandates, for 106,203 articles published in 276 biomedical journal 2004, above. The mandates are still very young and few, hence the sample is still small; and there are many potential artifacts, including selective noncompliance with the mandate as well as disciplinary bias. But the preliminary results so far suggest that (1) QA is indeed > 0, and (2) QA > QB. [I am sure that we will now have a second round from die-hards who will want to argue for a selective-compliance effect, as a 2nd-order last gasp for the QB-only hypothesis, but of course that loses all credibility as IRs approach 100% compliance: We are analyzing our mandated IRs separately now, to see whether we can detect any trends correlated with an IR's %OA. But (except for the die-hards, who will never die), I think even this early sample already shows that the OA advantage is unlikely to be only or mostly a QB effect.] HM: "3. Productive versus less productive authors. My analysis of differences in Citation Impact differentials between productive and less productive authors may seem "a little complicated". My point is that if one selects from a set of papers deposited in ArXiv a paper authored by a junior (or less productive) scientist, the probability that this paper is co-authored by a senior (or more productive) author is higher than it is for a paper authored by a junior scientist but not deposited in ArXiv. Next, I found that papers co-authored by both productive and less productive authors tend to have a higher citation impact than articles authored solely by less productive authors, regardless of whether these papers were deposited in ArXiv or not. These outcomes lead me to the conclusion that the observed higher CID for less productive authors compared to that of productive authors can be interpreted as a quality bias."It still sounds a bit complicated, but I think what you mean is that (1) mixed multi-author papers (ML, with M = More productive authors, L = less productive authors) are more likely to be cited than unmixed multi-author (LL) papers with the same number of authors, and that (2) such ML papers are also more likely to be self-archived. (Presumably MM papers are the most cited and most self-archived of multi-author papers.) That still sounds to me like a variant on the citation/self-archiving correlation, and hence intepretable as either QA or QB or both. (Chawki Hajjem has also found that citation counts are positively correlated with the number of authors an article has: this could either be a self-citation bias or evidence that multi-authored paper tend to be better ones.) HM: "4. General comments. In the citation analysis by Kurtz et al. (2005), both the citation and target universe contain a set of 7 core journals in astronomy. They explain their finding of no apparent OA effect in his study of these journals by postulating that "essentially all astronomers have access to the core journals through existing channels". In my study the target set consists of a limited number of core journals in condensed matter physics, but the citation universe is as large as the total Web of Science database, including also a number of more peripherical journals in the field. Therefore, my result is stronger than that obtained by Kurtz at al.: even in this much wider citation universe, I do not find evidence for an OA advantage effect."I agree that CondMP is less preprint-intensive, less accessible and less endogamous than Astrophysics, but it is still a good deal more preprint-intensive and accessible than most fields (and I don't yet know what role the exogamy/enodgamy factor plays in either citations or the OAA: it will be interesting to study, among many other candidate metrics, once the entire literature is OA). HM: "I realize that my study is a case study, examining in detail 6 journals in one subfield. I fully agree with your warning that one should be cautious in generalizing conclusions from case studies, and that results for other fields may be different. But it is certainly not an unimportant case. It relates to a subfield in physics, a discipline that your pioneering and stimulating work (Harnad and Brody, D-Lib Mag., June 2004) has analysed as well at a more aggregate level. I hope that more case studies will be carried out in the near future, applying the methodologies I proposed in my paper."Your case study is very timely and useful. However, robot-based studies based on much larger samples of journals and articles have now confirmed the OAA in many more fields, most of them not preprint-based at all, and with access problems more severe than those of physics. Conclusions I would like to conclude with a summary of the "QB vs. QA" evidence to date, as I understand it: (1) Many studies have reported the OA Advantage, across many fields.This will all be resolved soon, and the outcome of our QA vs. QB comparison for mandated vs. self-selected self-archiving already heralds this resolution. I am pretty confident that the empirical facts will turn out to have been the following: Yes, there is a QB component in the OA advantage (especially in the preprinting fields, such as astro, cond-mat and maths). But that QB component is neither the sole factor nor the largest factor in the OA advantage, particularly in the non-preprint fields with access problems -- and those fields constitute the vast majority. That will be the outcome that is demonstrated, and eventually not only the friends of OA but the foes of OA will have no choice but to acknowledge the new reality of OA, its benefits to research and researchers, and its immediate reachability through the prompt universal adoption of OA self-archiving mandates. Stevan Harnad & Chawki Hajjem American Scientist Open Access Forum Wednesday, January 17. 2007Citation Advantage For OA Self-Archiving Is Independent of Journal Impact Factor, Article Age, and Number of Co-Authors
In May 2006, Eysenbach published "Citation Advantage of Open Access Articles" in PLoS Biology, confirming -- by comparing OA vs. non-OA articles within one hybrid OA/non-OA journal -- the "OA Advantage" (higher citations for OA articles than for non-OA articles) that had previously been demonstrated by comparing OA (self-archived) vs. non-OA articles within non-OA journals. This new PLoS study was based on a sample of 1492 articles (212 OA, 1280 non-OA) published June-December 2004 in one very high-impact (i.e., high average citation rate) journal: Proceedings of the National Academy of Sciences (PNAS). The findings were useful because not only did they confirm the OA citation advantage, already demonstrated across millions of articles, thousands of journals, and over a dozen subject areas, but they showed that that advantage is already detectable as early as 4 months after publication. The PLoS study also controlled for a large number of variables that could have contributed to a false OA advantage (for example, if more of the authors that chose to provide OA had happened to be in subject areas that happened to have higher citation counts). Eysenbach's logistic and multiple regression analyses confirmed that this was not the case for any of the potentially confounding variables tested, including the (i) country, (ii) publication count and (iii) citation count of the author and the (iv) subject area and (v) number of co-authors of the article. However, both the Eysenbach article and the accompanying PLoS editorial, considerably overstated the significance of all the controls that were done, suggesting that (1) the pre-existing evidence, based mainly on OA self-archiving ("green OA") rather than OA publishing ("gold OA"), had not been "solid" but "limited" because it had not controlled for these potential "confounding effects." They also suggested that (2) the PLoS study's finding that gold OA generated more citations than green OA in PNAS pertained to OA in general rather than just to high-profile journals like PNAS (and that perhaps green OA is not even OA!): Eysenbach (2006): "[T[he [prior] evidence on the “OA advantage” is controversial. Previous research has based claims of an OA citation advantage mainly on studies looking at the impact of self-archived articles... (which some have argued to be different from open access in the narrower sense)... All these previous studies are cross-sectional and are subject to numerous limitations... Limited or no evidence is available on the citation impact of articles originally published as OA that are not confounded by the various biases and additional advantages [?] of self-archiving or “being online” that contribute to the previously observed OA effects."When I pointed out in a reply that subject areas, countries and years had all been analyzed separately in prior within-journal comparisons based on far larger samples, always with the same outcome -- the OA citation advantage -- making it highly unlikely that any of the other potentially confounding factors singled out in the PLoS/PNAS study would change that consistent pattern, Eysenbach responded: Eysenbach: "[T]o answer Harnad's question 'What confounding effects does Eysenbach expect from controlling for number of authors in a sample of over a million articles across a dozen disciplines and a dozen years all showing the very same, sizeable OA advantage? Does he seriously think that partialling out the variance in the number of authors would make a dent in that huge, consistent effect?' – the answer is “absolutely”.My doctoral student, Chawki Hajjem, has accordingly accepted Eysenbach's challenge, and done the requisite multiple regression analyses, testing not only (3) number of authors, but (1) number of years since publication, and (2) journal impact factor. The outcome is that (4) the OA self-archiving advantage (green OA) continues to be present as a robust, independent, statistically significant factor, alongside factors (1)-(3): In order of size of contribution:Tested: Article age (1) is of course the biggest factor: Articles' total citation counts grow as time goes by. Journal impact factor (2) is next: Articles in high-citation journals have higher citation counts: This is not just a circular effect of the fact that journal citation counts are just average journal-article citation counts: It is a true QB selection effect (nothing to do with OA!), namely, the higher quality articles tend to be submitted to and selected by the higher quality journals!. The next contributor to citation counts is the number of authors (3): This could be because there are more self-citations when there are more authors; or it could indicate that multi-authored articles tend to be of higher quality. But last, we have the contribution of OA self-archiving (4). It is the smallest of the four factors, but that is unsurprising, as surely article age and quality are the two biggest determinants of citations, whether the articles are OA or non-OA. (Perhaps self-citations are the third biggest contributor). But the OA citation advantage is present for those self-archived articles (and stronger for the higher quality ones, QA), refuting Eysenbach's claim that the green OA advantage is merely the result of "potential confounds" and that only the gold OA advantage is real. I might add that the PLoS Editorial is quite right to say: "Since most open-access journals are new, comparisons of the effects of open access with established subscription-based journals are easily confounded by age and reputation": Comparability and confounding are indeed major problems for between-journal comparisons, comparing OA and non-OA journals (gold OA). Until Eysenbach's within-journal PNAS study, "solid evidence" (for gold OA) was indeed hard to find. But comparability and confounding are far less of a problem for the within-journal analyses of self-archiving (green OA), and with them, solid evidence abounds. I might further add that the solid pre-existing evidence for the green OA advantage -- free of the limitations of between-journal comparisons -- is and always has been, by the same token, evidence for the gold OA advantage too, for it would be rather foolish and arbitrary to argue that free accessibility is only advantageous to self-archived articles, and not to articles published in OA journals! Yet that is precisely the kind of generalization Eysenbach seems to want to make (in the opposite direction) in the special case of PNAS -- a very selective, high-profile, high-impact journal. PNAS articles that are freely accessible on the PNAS website were found to have a greater OA advantage than PNAS articles freely accessible only on the author's website. With just a little reflection, however, it is obvious that the most likely reason for this effect is the high profile of PNAS and its website: That effect is hence highly unlikely to scale to all, most, or even many journals; nor is it likely to scale in time, for as green OA grows, the green OA harvesters like OAIster (or even just Google Scholar) will become the natural way and place to search, not the journal's website. Having taken up Eysenbach's challenge to test the independence of the OA self-archiving advantage from "potential confounds," we now challenge Eysenbach to test the generality of the PNAS gold/green advantage across the full quality hierarchy of journals, to show it is not merely a high-end effect. Let me close by mentioning one variable that Eysenbach did not (and could not) control for, namely, author self-selection bias (Quality Bias, QB): His 212 OA authors were asked to rate the relative urgency, importance, and quality of their articles and there was no difference between their OA and non-OA articles in these self-ratings. But (although I myself am quite ready to agree that there was little or no Quality Bias involved in determining which PNAS authors chose which PNAS articles to make OA gold), unfortunately these self-ratings are not likely to be enough to convince the sceptics who interpret the OA advantage as a Quality Bias (a self-selective tendency to provide OA to higher quality articles) rather than a Quality Advantage (QA) that increases the citations of higher quality articles. Not even the prior evidence of a correlation between earlier downloads and later citations is enough. The positive result of a more objective test of Quality Bias (QB) vs. Quality Advantage (QA) (comparing self-selected vs. mandated self-archiving, and likewise conducted by Chawki Hajjem) is reported ) here. REFERENCES Brody, T., Harnad, S. and Carr, L. (2005) Earlier Web Usage Statistics as Predictors of Later Citation Impact. Journal of the American Association for Information Science and Technology (JASIST) 57(8) pp. 1060-1072. Eysenbach G (2006) Citation Advantage of Open Access Articles. PLoS Biology 4(5) e157 DOI: 10.1371/journal.pbio.0040157 Hajjem, C., & Harnad, S. (2007) The Open Access Citation Advantage: Quality Advantage Or Quality Bias? Hajjem, C., Harnad, S. & Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4) pp. 39-47. Harnad, S. (2006) PLoS, Pipe-Dreams and Peccadillos. PLoS Biology Responses. MacCallum CJ & Parthasarathy H (2006) Open Access Increases Citation Rate. PLoS Biol 4(5): e176 DOI: 10.1371/journal.pbio.0040176 Moed, H. F. (2006) The effect of 'Open Access' upon citation impact: An analysis of ArXiv's Condensed Matter Section Stevan Harnad & Chawki Hajjem American Scientist Open Access Forum
(Page 1 of 1, totaling 2 entries)
|
QuicksearchSyndicate This BlogMaterials You Are Invited To Use To Promote OA Self-Archiving:
Videos:
The American Scientist Open Access Forum has been chronicling and often directing the course of progress in providing Open Access to Universities' Peer-Reviewed Research Articles since its inception in the US in 1998 by the American Scientist, published by the Sigma Xi Society. The Forum is largely for policy-makers at universities, research institutions and research funding agencies worldwide who are interested in institutional Open Acess Provision policy. (It is not a general discussion group for serials, pricing or publishing issues: it is specifically focussed on institutional Open Acess policy.)
You can sign on to the Forum here.
ArchivesCalendar
CategoriesBlog AdministrationStatisticsLast entry: 2018-09-14 13:27
1129 entries written
238 comments have been made
Top ReferrersSyndicate This Blog |