Sunday, October 19. 2008On Metrics and Metaphysics'the man who is ready to prove that metaphysics is wholly impossible... is a brother metaphysician with a rival theory.'A critique of metrics and European Reference Index for the Humanities (ERIH) by History of Science, Technology and Medicine journal editors has been posted on the Classicists list. ERIH looks like an attempt to set up a bigger, better alternative to the ISI Journal Impact Factor (JIF), tailored specifically for the Humanities. The protest from the journal editors looks as if it is partly anti-JIF, partly opposed to the ERIH approach and appointees, and partly anti-metrics. Date: Sun, 19 Oct 2008 11:56:22 +0100 Sender: Classicists From: Nick Lowe Subject: History of Science pulls out of ERIH [As editorial boards and subject associations in other humanities subjects contemplate their options, this announcement by journals in History of Science seems worth passing on in full. Thanks to Stephen Clark for the forward.]Journals under Threat: A Joint Response from History of Science, Technology and Medicine Editors We live in an age of metrics. All around us, things are being standardized, quantified, measured. Scholars concerned with the work of science and technology must regard this as a fascinating and crucial practical, cultural and intellectual phenomenon. Analysis of the roots and meaning of metrics and metrology has been a preoccupation of much of the best work in our field for the past quarter century at least. As practitioners of the interconnected disciplines that make up the field of science studies we understand how significant, contingent and uncertain can be the process of rendering nature and society in grades, classes and numbers. We now confront a situation in which our own research work is being subjected to putatively precise accountancy by arbitrary and unaccountable agencies. Some may already be aware of the proposed European Reference Index for the Humanities (ERIH), an initiative originating with the European Science Foundation. The ERIH is an attempt to grade journals in the humanities - including "history and philosophy of science". The initiative proposes a league table of academic journals, with premier, second and third divisions. According to the European Science Foundation, ERIH "aims initially to identify, and gain more visibility for, top-quality European Humanities research published in academic journals in, potentially, all European languages". It is hoped "that ERIH will form the backbone of a fully-fledged research information system for the Humanities". What is meant, however, is that ERIH will provide funding bodies and other agencies in Europe and elsewhere with an allegedly exact measure of research quality. In short, if research is published in a premier league journal it will be recognized as first rate; if it appears somewhere in the lower divisions, it will be rated (and not funded) accordingly. This initiative is entirely defective in conception and execution. Consider the major issues of accountability and transparency. The process of producing the graded list of journals in science studies was overseen by a committee of four (the membership is currently listed at No indication has been given of the means through which the list was compiled; nor how it might be maintained in the future. The ERIH depends on a fundamental misunderstanding of conduct and publication of research in our field, and in the humanities in general. Journals' quality cannot be separated from their contents and their review processes. Great research may be published anywhere and in any language. Truly ground-breaking work may be more likely to appear from marginal, dissident or unexpected sources, rather than from a well-established and entrenched mainstream. Our journals are various, heterogeneous and distinct. Some are aimed at a broad, general and international readership, others are more specialized in their content and implied audience. Their scope and readership say nothing about the quality of their intellectual content. The ERIH, on the other hand, confuses internationality with quality in a way that is particularly prejudicial to specialist and non-English language journals. In a recent report, the British Academy, with judicious understatement, concludes that "the European Reference Index for the Humanities as presently conceived does not represent a reliable way in which metrics of peer-reviewed publications can be constructed" (Peer Review: the Challenges for the Humanities and Social Sciences, September 2007: Hanne Andersen (Centaurus) Friday, October 10. 2008Open Access Book-Impact and "Demotic" MetricsSUMMARY: Unlike with OA's primary target, journal articles, the deposit of the full-texts of books in Open Access Repositories cannot be mandated, only encouraged. However, the deposit of book metadata + plus + reference-lists can and should be mandated. That will create the metric that the book-based disciplines need most: a book citation index. ISI's Web of Science only covers citations of books by (indexed) journal articles, but book-based disciplines' biggest need is book-to-book citations. Citebase could provide that, once the book reference metadata are being deposited in the IRs too, rather than just article postprints. (Google Books and Google Scholar are already providing a first approximation to book citation count.) Analogues of "download" metrics for books are also potentially obtainable from book vendors, beginning with Amazon Sales Rank. In the Humanities it also matters for credit and impact how much the non-academic (hence non-citing) public is reading their books ("Demotic Metrics"). IRs can not only (1) add book-metadata/reference deposit to their OA Deposit Mandates, but they can (2) harvest Amazon book-sales metrics for their book metadata deposits, to add to their IR stats. IRs can also already harvest Google Books (and Google Scholar) book-citation counts today, as a first step toward constructing a distributed, universal OA book-citation index. The Dublin humanities metrics conference was also concerned about other kinds of online works, and how to measure and credit their impact: Metrics don't stop with citation counts and download counts. Among the many "Demotic metrics" that can also be counted are link-counts, tag-counts, blog-mentions, and web mentions. This applies to books/authors, as well as to data, to courseware and to other identifiable online resources. We should hasten the progress of book metrics, and that will in turn accelerate the growth in OA's primary target content: journal articles, as well as increasing support for institutional and funder OA Deposit Mandates. The deposit of the full-texts of book-chapters and monographs in Open Access Repositories should of course be encouraged wherever possible, but, unlike with journal articles, full-text book deposit itself cannot be mandated. The most important additional thing that the OA movement should be singling out and emphasizing -- over and above the Immediate Deposit (IR) Mandate plus the email-eprint-request Button and the use of metrics to motivate mandates -- is the author deposit of all book metadata+plus+reference+lists in the author's OA Institutional Repository (IR). That will create the metric that the book-based disciplines need the most. This has been mentioned before, as a possibility and a desideratum for institutional (and funder) OA policy, but it is now crystal clear why it is so important (and so easy to implement). By systematically ensuring the IR deposit of each book's bibliographic metadata plus its cited-works bibliography, institutions (and funders) are actually creating a book citation index. This became apparent (again) at the Dublin humanities metrics conference, when ISI's VP Jim Pringle repeated ISI 's (rather weak) response to the Humanities' need for a book citation index, pointing out that "ISI does cover citations of books -- by journal articles." But that of course is anything but sufficient for book-based disciplines, whose concern is mainly about book-to-book citations! Yet that is precisely what can be harvested out of IRs (by, for example, Citebase, or a Citebase-like scientometric engine) -- if only the book reference metadata, too, are deposited in the IRs, rather than only article postprints. That immediately begins making the IR network into a unique and much-needed book-citation (distributed) database. (Moreover, Google Books and Google Scholar are already providing a first approximation to this.) And there's more: Obviously OA IRs will not be able to get book download counts -- analogous to article download counts -- when the only thing deposited is the book's metadata and reference list. However, in his paper at this Dublin conference, Janus Linmans -- in cleaving to his age-old bibliometric measure of library book-holdings lists as the surrogate for book citation counts in his analyses -- inadvertently gave me another obvious idea, over and above the deposit and harvesting of book reference metadata: Library holdings are just one, weak, indirect metric of book usage (and Google Book Collections already collects some of those data). But far better analogues of "downloads" for books are potentially obtainable from book vendors, beginning with Amazon Sales Rank, but eventually including conventional book vendors too (metrics do not end with web-based data): The researchers from the Humanities stressed in Dublin that the book-to-book (and journal-to-book and book-to-journal) citation counts would be most welcome and useful, but in the Humanities even those do not tell the whole story, because it also matters for the credit and impact of a Humanities' researcher how much the non-academic (hence non-citing) public is reading their books too. (Let us call these non-academic metrics "Demotic Metrics.") Well, starting with a systematic Amazon book-sales count, per book deposited in the IR (and eventually extended to many book-vendors, online and conventional), the ball can be set in motion very easily. IRs can not only formally (1) add book-metadata/reference deposit to their OA Deposit Mandates, but they can (2) systematically harvest Amazon book-sales metrics for their book items to add to their IR stats for each deposit. And there's more: IRs can also harvest Google Books (and Google Scholar) book-citation counts, already today, as a first approximation to constructing a distributed, universal OA book-citation index, even before the practice of depositing book metadata/reference has progressed far enough to provide useful data on its own: Whenever book metadata are deposited in an IR, the IR automatically does (i) an Amazon query (number of sales of this book) plus (ii) a Google-Books/Google-Scholar query (number of citations of this book). These obvious and immediately feasible additions to an institutional OA mandate and to its IR software configuration and functionality would not only yield immediate useful and desirable metrics and motivate Humanists to become even more supportive of OA and metrics, but it would help set in motion practices that (yet again) are so obviously optimal and feasible for science and scholarship as to be inevitable. We should hasten the progress of book metrics, and that will in turn accelerate the growth in OA's primary target content: journal articles, as well as increasing support for institutional and funder OA Deposit Mandates. One further spin-off of the Dublin Metrics Conference was other kinds of online works, and how to measure and credit their impact: Metrics don't stop with citation counts and download counts! Among the many "Demotic metrics" that can also be counted are link-counts, tag-counts, blog-mentions, and web mentions. This applies to books/authors, as well as to data, to courseware and to other identifiable online resources. In "Appearance and Reality," Bradley (1897/2002) wrote (of Ayer) that 'the man who is ready to prove that metaphysics is wholly impossible ... is a brother metaphysician with a rival theory. Well, one might say the same of those who are skeptical about metrics: There are only two ways to measure the quality, importance or impact of a piece of work: Subjectively, by asking experts for their judgment (peer review: and then you have a polling metric!) or objectively, by counting objective data of various kinds. But of course counting and then declaring those counts "metrics" for some criterion or other, by fiat, is not enough. Those candidate metrics have to be validated against that criterion, either by showing that they correlate highly with the criterion, or that they correlate highly with an already validated correlate of the criterion. One natural criterion is expert judgment itself: peer review. Objective metrics can then be validated against peer review. Book citation metrics need to be added to the rich and growing battery of candidate metrics, and so do "demotic metrics." Brody, T., Kampa, S., Harnad, S., Carr, L. and Hitchcock, S. (2003) Digitometric Services for Open Archives Environments. In Proceedings of European Conference on Digital Libraries 2003, pp. 207-220, Trondheim, Norway. Brody, T., Carr, L., Harnad, S. and Swan, A. (2007) Time to Convert to Metrics. Research Fortnight pp. 17-18. Brody, T., Carr, L., Gingras, Y., Hajjem, C., Harnad, S. and Swan, A. (2007) Incentivizing the Open Access Research Web: Publication-Archiving, Data-Archiving and Scientometrics. CTWatch Quarterly 3(3). Carr, L., Hitchcock, S., Oppenheim, C., McDonald, J. W., Champion, T. and Harnad, S. (2006) Extending journal-based research impact assessment to book-based disciplines. Technical Report, ECS, University of Southampton. Harnad, S. (2001) Research access, impact and assessment. Times Higher Education Supplement 1487: p. 16. Harnad, S., Carr, L., Brody, T. & Oppenheim, C. (2003) Mandated online RAE CVs Linked to University Eprint Archives: Improving the UK Research Assessment Exercise whilst making it cheaper and easier. Ariadne 35. Harnad, S. (2006) Online, Continuous, Metrics-Based Research Assessment. Technical Report, ECS, University of Southampton. Harnad, S. (2007) Open Access Scientometrics and the UK Research Assessment Exercise. In Proceedings of 11th Annual Meeting of the International Society for Scientometrics and Informetrics 11(1), pp. 27-33, Madrid, Spain. Torres-Salinas, D. and Moed, H. F., Eds. Harnad, S. (2008) Self-Archiving, Metrics and Mandates. Science Editor 31(2) 57-59 Harnad, S. (2008) Validating Research Performance Metrics Against Peer Rankings. Ethics in Science and Environmental Politics 8 (11) doi:10.3354/esep00088 The Use And Misuse Of Bibliometric Indices In Evaluating Scholarly Performance Harnad, S., Carr, L. and Gingras, Y. (2008) Maximizing Research Progress Through Open Access Mandates and Metrics. Liinc em Revista. Open Access and the Skewness of Science: It Can't Be Cream All the Way DownYoung NS, Ioannidis JPA, Al-Ubaydli O (2008) Why Current Publication Practices May Distort Science. PLoS Medicine Vol. 5, No. 10, e201 doi:10.1371/journal.pmed.0050201There are reasons to be skeptical about the conclusions of this PLoS Medicine article. It says that science is compromised by insufficient "high impact" journals to publish in. The truth is that just about everything gets published somewhere among the planet's 25,000 peer reviewed journals, just not all in the top journals, which are, by definition, reserved for the top articles -- and not all articles can be top articles. The triage (peer review) is not perfect, so sometimes an article will appear lower (or higher) in the journal quality hierarchy than it ought to. But now that funders and universities are mandating Open Access, all research, top, middle and low will be accessible to everyone. This will correct any access inequities and it will also help remedy quality misassignment (inasmuch as lower quality journals may have fewer subscribers, and users may be less likely to consult lower quality journals). But it will not change the fact that 80% of citations (and presumably usage) goes to the top 20% of articles, though it may flatten this "skewness of science" (Seglen 1992) somewhat. Seglen PO (1992) The skewness of science. Journal of the American Society for Information Science 43:628-38 Stevan Harnad American Scientist Open Access Forum Tuesday, August 12. 2008Use And Misuse Of Bibliometric Indices In Scholarly Performance Evaluation
Ethics In Science And Environmental Politics (ESEP)
ESEP Theme Section: The Use And Misuse Of Bibliometric Indices In Evaluating Scholarly Performance + accompanying Discussion Forum Editors: Howard I. Browman, Konstantinos I. Stergiou Quantifying the relative performance of individual scholars, groups of scholars, departments, institutions, provinces/states/regions and countries has become an integral part of decision-making over research policy, funding allocations, awarding of grants, faculty hirings, and claims for promotion and tenure. Bibliometric indices (based mainly upon citation counts), such as the h-index and the journal impact factor, are heavily relied upon in such assessments. There is a growing consensus, and a deep concern, that these indices — more-and-more often used as a replacement for the informed judgement of peers — are misunderstood and are, therefore, often misinterpreted and misused. The articles in this ESEP Theme Section present a range of perspectives on these issues. Alternative approaches, tools and metrics that will hopefully lead to a more balanced role for these instruments are presented.Browman HI, Stergiou KI INTRODUCTION: Factors and indices are one thing, deciding who is scholarly, why they are scholarly, and the relative value of their scholarship is something else entirely Monday, August 4. 2008Are Online and Free Online Access Broadening or Narrowing Research?
Evans, James A. (2008) Electronic Publication and the Narrowing of Science and Scholarship Science 321(5887): 395-399 DOI:10.1126/science.1150473Evans found that as more and more journal issues are becoming accessible online (mostly only the older back-issues for free), journals are not being cited less overall, but citations are narrowing down to fewer articles, cited more.Excerpt: "[Based on] a database of 34 million articles, their citations (1945 to 2005), and online availability (1998 to 2005),... as more journal issues came online, the articles [cited] tended to be more recent, fewer journals and articles were cited, and more of those citations were to fewer journals and articles... [B]rowsing of print archives may have [led] scientists and scholars to [use more] past and present scholarship. Searching online... may accelerate consensus and narrow the range of findings and ideas built upon." In one of the few fields where this can be and has been analyzed thoroughly, astrophysics, which effectively has 100% Open Access (OA) (free online access) already, Michael Kurtz too found that with free online access to everything, reference lists became (a little) shorter, not longer, i.e., people are citing (somewhat) fewer papers, not more, when everything is accessible to them free online. The following seems a plausible explanation: Before OA, researchers cited what they could afford to access, and that was not necessarily all the best work, so they could not be optimally selective for quality, importance and relevance. (Sometimes -- dare one say it? -- they may even have resorted to citing "blind," going by just the title and abstract, which they could afford, but not the full text, to which they had no subscription.) In contrast, when everything becomes accessible, researchers can be more selective and can cite only what is most relevant, important and of high quality. (It has been true all along that about 80-90% of citations go to the top 10-20% of articles. Now that the top 10-20% (along with everything else in astrophysics), is accessible to everyone, everyone can cite it, and cull out the less relevant or important 80-90%. This is not to say that OA does not also generate some extra citations for lesser articles too; but the OA citation advantage is bigger, the better the article -- the "quality advantage" -- (and perhaps most articles are not that good!). Since the majority of published articles are uncited (or only self-cited), there is probably a lot published that no amount of exposure and access can render worth citing! (I think there may also exist some studies [independent of OA] on "like citing like" -- i.e., articles tending to be cited more at their own "quality" level rather than a higher one. [Simplistically, this means within their own citation bracket, rather than a higher one.] If true, this too could probably be analyzed from an OA standpoint.) But the trouble is that apart from astrophysics and high energy physics, no other field has anywhere near 100% OA: It's closer to 15% in other fields. So aside from a (slightly negative) global correlation (between the growth of OA and the average length of the reference list), the effect of OA cannot be very deeply analyzed in most fields yet. In addition, insofar as OA is concerned, much of the Evans effect seems to be based on "legacy OA," in which it is the older literature that is gradually being made accessible online or freely accessible online, after a long non-online, non-free interval. Fields differ in their speed of uptake and their citation latencies. In physics, which has a rapid turnaround time, there is already a tendency to cite recent work more, and OA is making the turnaround time even faster. In longer-latency fields, the picture may differ. For the legacy-OA effect especially, it is important to sort fields by their citation turnaround times; otherwise there can be biases (e.g. if short- or long-latency fields differ in the degree to which they do legacy OA archiving). If I had to choose between the explanation of the Evans effect as a recency/bandwagon effect, as Evans interprets it, or as an increased overall quality/selectivity effect, I'd choose the latter (though I don't doubt there is a bandwagon effect too). And that is even without going on to point out that Tenopir & King, Gingras and others have shown that -- with or without OA -- there is still a good deal of usage and citation of the legacy literature (though it differs from field to field). I wouldn't set much store by "skimming serendipity" (the discovery of adjacent work while skimming through print issues), since online search and retrieval has at least as much scope for serendipity. (And one would expect more likelihood of a bandwagon effect without OA, where authors may tend to cite already cited but inaccessible references "cite unseen.") Are online and free online access broadening or narrowing research? They are broadening it by making all of it accessible to all researchers, focusing it on the best rather than merely the accessible, and accelerating it. Stevan Harnad American Scientist Open Access Forum Sunday, June 15. 2008Citation Statistics: International Mathematical Union Report
Charles Oppenheim wrote (in the American Scientist Open Access Forum):Numbers with a number of problems CHARLES OPPENHEIM: "I've now read the whole report. Yes, it tilts at the usual windmills, and rightly dismissed the use of Impact factors for anything but crude comparisons, but it fails to address the fundamental issue, which is: citation and other metrics correlate superbly with subjective peer review. Both methods have their faults, but they are clearly measuring the same (or closely related) things. Ergo, if you have evaluate research in some way, there is no reason NOT to use them! It also keeps referring to examples from the field of maths, which is a very strange subject citation-wise."I have now read the IMU report too, and agree with Charles that it makes many valid points but it misunderstands the one fundamental point concerning the question at hand: Can and should metrics be used in place of peer-panel based rankings in the UK Research Assessment Exercise (RAE) and its successors and homologues elsewhere? And there the answer is a definite Yes. The IMU critique points out that research metrics in particular and statistics in general are often misused, and this is certainly true. It also points out that metrics are often used without validation. This too is correct. There is also a simplistic tendency to try to use one single metric, rather than multiple metrics that can complement and correct one another. There too, a practical and methodological error is correctly pointed out. It is also true that the "journal impact factor" has many flaws, and should on no account be used to rank individual papers or researchers, and especially not alone, as a single metric. But what all this valuable, valid cautionary discussion overlooks is not only the possibility but the empirically demonstrated fact that there exist metrics that are highly correlated with human expert rankings. It follows that to the degree that such metrics account for the same variance, they can substitute for the human rankings. The substitution is desirable, because expert rankings are extremely costly in terms of expert time and resources. Moreover, a metric that can be shown to be highly correlated with an already validated variable predictor variable (such as expert rankings) thereby itself becomes a validated predictor variable. And this is why the answer to the basic question of whether the RAE's decision to convert to metrics was a sound one is: Yes. Nevertheless, the IMU's cautions are welcome: Metrics do need to be validated; they do need to be multiple, rather than a single, unidimensional index; they do have to be separately validated for each discipline, and the weights on the multiple metrics need to be calibrated and adjusted both for the discipline being assessed and for the properties on which it is being ranked. The RAE 2008 database provides the ideal opportunity to do all this discipline-specific validation and calibration, because it is providing parallel data from both peer panel rankings and metrics. The metrics, however, should be as rich and diverse as possible, to capitalize on this unique opportunity for joint validation. Here are some comments on particular points in the IMU report. (All quotes are from the report): The meaning of a citation can be even more subjective than peer review.True. But if there is a non-metric criterion measure -- such as peer review -- on which we already rely, then metrics can be cross-validated against that criterion measure, and this is exactly what the RAE 2008 database makes it possible to do, for all disciplines, at the level of an entire sizeable nation's total research output... The sole reliance on citation data provides at best an incomplete and often shallow understanding of research -- an understanding that is valid only when reinforced by other judgments.This is correct. But the empirical fact has turned out to be that a department's total article/author citation counts are highly correlated with its peer rankings in the RAE in every discipline tested. This does not mean that citation counts are the only metric that should be used, or that they account for 100% of the variance in peer rankings. But it is strong evidence that citation counts should be among the metrics used, and it constitutes a (pairwise) validation. Using the impact factor alone to judge a journal is like using weight alone to judge a person's health.For papers, instead of relying on the actual count of citations to compare individual papers, people frequently substitute the impact factor of the journals in which the papers appear. As noted, this is a foolish error if the journal impact factor is used alone, but it may enhance predictivity and hence validity if added to a battery of jointly validated metrics. The validity of statistics such as the impact factor and h-index is neither well understood nor well studied.The h-index (and its variants) were created ad hoc, without validation. They turn out to be highly correlated with citation counts (for obvious reasons, since they are in part based on them). Again, they are all welcome in a battery of metrics to be jointly cross-validated against peer rankings or other already-validated or face-valid metrics. citation data provide only a limited and incomplete view of research quality, and the statistics derived from citation data are sometimes poorly understood and misused.It is certainly true that there are many more potential metrics of research performance productivity, impact and quality than just citation metrics (e.g., download counts, student counts, research funding, etc.). They should all be jointly validated, discipline by discipline and each metric should be weighted according to what percentage of the criterion variance (e.g., RAE 2008 peer rankings) it predicts. relying primarily on metrics (statistics) derived from citation data rather than a variety of methods, including judgments by scientists themselves...The whole point is to cross-validate the metrics against the peer judgments, and then use the weighted metrics in place of the peer judgments, in accordance with their validated predictive power. bibliometrics (using counts of journal articles and their citations) will be a central quality index in this system [RAE]Yes, but the successor of RAE is not yet clear on which metrics it will use, and whether and how it will validate them. There is still some risk that a small number of metrics will simply be picked a priori, without systematic validation. It is to be hoped that the IMU critique, along with other critiques and recommendations, will result in the use of the 2008 parallel metric/peer data for a systematic and exhaustive cross-validation exercise, separately for each discipline. Future assessments can then use the metric battery, with initialized weights (specific to each discipline), and can calibrate and optimize them across the years, as more data accumulates -- including spot-checks cross-validating periodically against "light-touch" peer rankings and other validated or face-valid measures. sole reliance on citation-based metrics replaces one kind of judgment with another. Instead of subjective peer review one has the subjective interpretation of a citation's meaning.Correct. This is why multiple metrics are needed, and why they need to be systematically cross-validated against already-validated or face-valid criteria (such as peer judgment). Research usually has multiple goals, both short-term and long, and it is therefore reasonable that its value must be judged by multiple criteria.Yes, and this means multiple, validated metrics. (Time-course parameters, such as growth and decay rates of download, citation and other metrics are themselves metrics.) many things, both real and abstract, that cannot be simply ordered, in the sense that each two can be comparedYes, we should not compare the incomparable and incommensurable. But whatever we are already comparing, by other means, can be used to cross-validate metrics. (And of course it should be done discipline by discipline, and sometimes even by sub-discipline, rather than by treating all research as if it were of the same kind, with the same metrics and weights.) lea to use multiple methods to assess the quality of researchValid plea, but the multiple "methods" means multiple metrics, to be tested for reliability and validity against already validated methods. Measures of esteem such as invitations, membership on editorial boards, and awards often measure quality. In some disciplines and in some countries, grant funding can play a role. And peer review -- the judgment of fellow scientists -- is an important component of assessment.These are all sensible candidate metrics to be included, alongside citation and other candidate metrics, in the multiple regression equation to be cross-validated jointly against already validated criteria, such as peer rankings (especially in RAE 2008). lure of a simple process and simple numbers (preferably a single number) seems to overcome common sense and good judgment.Validation should definitely be done with multiple metrics, jointly, using multiple regression analysis, not with a single metric, and not one at a time. special citation culture of mathematics, with low citation counts for journals, papers, and authors, makes it especially vulnerable to the abuse of citation statistics.Metric validation and weighting should been done separately, field by field. For some fields, such as bio-medical sciences, this is appropriate because most published articles receive most of their citations soon after publication. In other fields, such as mathematics, most citations occur beyond the two-year period.Chronometrics -- growth and decay rates and other time-based parameters for download, citations and other time-based, cumulative measures -- should be among the battery of candidate metrics for validation. The impact factor varies considerably among disciplines... The impact factor can vary considerably from year to year, and the variation tends to be larger for smaller journals.All true. Hence the journal impact factor -- perhaps with various time constants -- should be part of the battery of candidate metrics, not simply used a priori. The most important criticism of the impact factor is that its meaning is not well understood. When using the impact factor to compare two journals, there is no a priori model that defines what it means to be "better". The only model derives from the impact factor itself -- a larger impact factor means a better journal... How does the impact factor measure quality? Is it the best statistic to measure quality? What precisely does it measure? Remarkably little is known...And this is because the journal impact factor (like most other metrics) has not been cross-validated against face-valid criteria, such as peer rankings. employing other criteria to refine the ranking and verify that the groups make senseIn other words, systematic cross-validation is needed. impact factor cannot be used to compare journals across disciplinesAll metrics should be independently validated for each discipline. impact factor may not accurately reflect the full range of citation activity in some disciplines, both because not all journals are indexed and because the time period is too short. Other statistics based on longer periods of time and more journals may be better indicators of quality. Finally, citations are only one way to judge journals, and should be supplemented with other informationChronometrics. And multiple metrics The impact factor and similar citation-based statistics can be misused when ranking journals, but there is a more fundamental and more insidious misuse: Using the impact factor to compare individual papers, people, programs, or even disciplinesIndividual citation counts and other metrics: Multiple metrics, jointly validated. the distribution of citation counts for individual papers in a journal is highly skewed, approximating a so-called power law... highly skewed distribution and the narrow window of time used to compute the impact factorTo the extent that distributions are pertinent, they too can be parametrized and taken into account in validating metrics. Comparing like with like (e.g., discipline by discipline) should also help maximize comparability. using the impact factor as a proxy for actual citation counts for individual papersNo need to use one metric as a proxy for another. Jointly validate them all. if you want to rank a person's papers using only citations to measure the quality of a particular paper, you must begin by counting that paper's citations. The impact factor of the journal in which the paper appears is not a reliable substitute.Correct, but this obvious truth does not need to be repeated so many times, and it is an argument against single metrics in general; and journal impact factor as a single factor in particular. But there's nothing wrong with using it in a battery of metrics for validation. h-index Hirsch extols the virtues of the h-index by claiming that "h is preferable to other single-number criteria commonly used to evaluate scientific output of a researcher..."[Hirsch 2005, p. 1], but he neither defines "preferable" nor explains why one wants to find "single-number criteria."... Much of the analysis consists of showing "convergent validity," that is, the h-index correlates well with other publication/citation metrics, such as the number of published papers or the total number of citations. This correlation is unremarkable, since all these variables are functions of the same basic phenomenon...The h-index is again a single metric. And cross-validation only works against either an already validated or a face-valid criterion, not just another unvalidated metric. And the only way multiple metrics, all inter-correlated, can be partitioned and weighted is with multiple regression analysis -- and once again against a criterion, such as peer rankings. Some might argue that the meaning of citations is immaterial because citation-based statistics are highly correlated with some other measure of research quality (such as peer review).Not only might some say it: Many have said it, and they are quite right. That means citation counts have been validated against peer review, pairwise. Now it is time to cross-validate and entire spectrum of candidate metrics, so each can be weighted for its predictive contribution. The conclusion seems to be that citation-based statistics, regardless of their precise meaning, should replace other methods of assessment, because they often agree with them. Aside from the circularity of this argument, the fallacy of such reasoning is easy to see.The argument is circular only if unvalidated metrics are being cross-correlated with other unvalidated metrics. Then it's a skyhook. But when they are cross-validated against a criterion like peer rankings, which have been the predominant basis for the RAE for 20 years, they are being cross-validated against a face-valid criterion -- for which they can indeed be subsequently substituted, if the correlation turns out to be high enough. "Damned lies and statistics"Yes, one can lie with unvalidated metrics and statistics. But we are talking here about validating metics against validated or face-valid criteria. In that case, the metrics lie no more (or less) than the criteria did, before the substitution. Several groups have pushed the idea of using Google Scholar to implement citation-based statistics, such as the h-index, but the data contained in Google Scholar is often inaccurate (since things like author names are automatically extracted from web postings)...This is correct. But Google Scholar's accuracy is growing daily, with growing content, and there are ways to triangulate author identity from such data even before the (inevitable) unique author identifier is adopted. Citation statistics for individual scientists are sometimes difficult to obtain because authors are not uniquely identified...True, but a good approximation is -- or will soon be -- possible (not for arbitrary search on the works of "Lee," but, for example, for all the works of all the authors in the UK university LDAPs). Citation counts seem to be correlated with quality, and there is an intuitive understanding that high-quality articles are highly-cited.The intuition is replaced by objective data once the correlation with peer rankings of quality is demonstrated (and replaced in proportion to the proportion of the criterion variance accounted for) by the predictor metric. But as explained above, some articles, especially in some disciplines, are highly-cited for reasons other than high quality, and it does not follow that highly-cited articles are necessarily high quality.This is why validation/weighting of metrics must be done separately, discipline by discipline, and why citation metrics alone are not enough: multiple metrics are needed to take into account multiple influences on quality and impact, and to weight them accordingly. The precise interpretation of rankings based on citation statistics needs to be better understood.Once a sufficiently broad and predictive battery of metrics is validated and its weights initialized (e.g., in RAE 2008), further interpretation and fine-tuning can follow. In addition, if citation statistics play a central role in research assessment, it is clear that authors, editors, and even publishers will find ways to manipulate the system to their advantage.True, but inasmuch as the new metric batteries will be Open Access, there will also be multiple metrics for detecting metric anomalies, inconsistency and manipulation, and for naming and shaming the manipulators, which will serve to control the temptation. Harnad, S. (2001) Research access, impact and assessment. Times Higher Education Supplement 1487: p. 16. Loet Leydesdorff wrote in the ASIS&T Special Interest Group on Metrics: LL: "It seems to me that it is difficult to generalize from one setting in which human experts and certain ranks coincided to the existence of such correlations across the board. Much may depend on how the experts are selected. I did some research in which referee reports did not correlate with citation and publication measures."Much may depend on how the experts are selected, but that was just as true during the 20 years in which rankings by experts were the sole criterion for the rankings in the UR Research Assessment Exercise (RAE). (In validating predictive metrics one must not endeavor to be Holier than the Pope: Your predictor can at best hope to be as good as, but not better than, your criterion.) That said: All correlations to date between total departmental author citation counts (not journal impact factors!) and RAE peer rankings have been positive, sizable, and statistically significant for the RAE, in all disciplines and all years tested. Variance there will be, always, but a good-sized component from citations alone seems to be well-established. Please see the studies of Professor Oppenheim and others, for example as cited in: Harnad, S., Carr, L., Brody, T. & Oppenheim, C. (2003) Mandated online RAE CVs Linked to University Eprint Archives: Improving the UK Research Assessment Exercise whilst making it cheaper and easier. Ariadne 35. LL: "Human experts are necessarily selected from a population of experts, and it is often difficult to delineate between fields of expertise."Correct. And the RAE rankings are done separately, discipline by discipline; the validation of the metrics should be done that way too. Perhaps there is sometimes a case for separate rankings even at sub-disciplinary level. I expect the departments will be able to sort that out. (And note that the RAE correlations do not constitute a validation of metrics for evaluating individuals: I am confident that that too will be possible, but it will require many more metrics and much more validation.) LL: "Similarly, we know from quite some research that citation and publication practices are field-specific and that fields are not so easy to delineate. Results may be very sensitive to choices made, for example, in terms of citation windows."As noted, some of the variance in peer judgments will depend on the sample of peers chosen; that is unavoidable. That is also why "light touch" peer re-validation, spot-checks, updates and optimizations on the initialized metric weights are also a good idea, across the years. As to the need to evaluate sub-disciplines independently: that question exceeds the scope of metrics and metric validation. LL: "Thus, I am bit doubtful about your claims of an 'empirically demonstrated fact'."Within the scope mentioned -- the RAE peer rankings, for disciplines such as they have been partitioned for the past two decades -- there is ample grounds for confidence in the empirical results to date. (And please note that this has nothing to do with journal impact factors, journal field classification, or journal rankings. It is about the RAE and the ranking of university departments by peer panels, as correlated with citation counts.) Stevan Harnad American Scientist Open Access Forum Friday, February 8. 2008New Ranking of Central and Institutional Repositories
The Webometrics Ranking of World Universities has created a new Ranking of Repositories, but in the announcement, a few salient points are overlooked:
Yes, as noted, the three first ranks go to "thematic" (i.e., discipline- or or subject-based) Central Repositories (CRs): (1) Arxiv (Physics), (2) Repec (Economics) and (3) E-Lis (Library Science). That is to be expected, because such CRs are fed from institutions all over the world. But the fourth-ranked repository -- and the first of the university-based Institutional Repositories (IRs), displaying only its own institutional output -- is (4) U Southampton EPrints (even though Southampton's University rank is 77th). Moreover, the fifteenth place repository -- and the first of the department-based IRs -- is (15) U Southampton ECS EPrints (making it 10th ranked even relative to university-wide IRs!). None of this is surprising: In 2000 Southampton created the world's first free, OAI-compliant IR-creating software -- EPrints -- now used (and imitated) worldwide. But Southampton's ECS also adopted the world's first Green OA self-archiving mandate, now also being emulated worldwide. And that first mandate was a departmental mandate, which partly explains the remarkably high rank of Southampton's ECS departmental IR. But these repository rankings (by Webometrics as well as by ROAR) should be interpreted with caution, because not all the CRs and IRs contain full-texts. Some only contain metadata. Southampton's university-wide IR, although 4th among repositories and 1st among IRs, is still mostly just metadata, because the university-wide mandate that U. Southampton has since adopted still has not been officially announced or implemented (because the university had been preparing for the 2008 Research Assessment Exercise returns). As soon as the mandate is implemented, that will change. (Southampton's ECS departmental IR, in contrast, mandated since 2002, is already virtually 100% full-text.) But the moral of the story is that what Southampton is right now enjoying is not just the well-earned visibility of its research output, but also a competitive advantage over other institutions, because of its head-start, both in creating IRs and in adopting a mandate to fill them. (This head-start is also reflected in Southampton's unusually high University Metrics "G Factor," and probably in its University Webometric rank too. Citebase is likewise constrained by who has and has not begun to systematically self-archive. And Citeseer has alas stopped updating as of about 2003.) I am not saying all this by way of bragging! I am begging other institutions to take advantage of the fact that it's still early days: Get a competitive head start too -- by creating an IR, and, most important of all, by adopting a Green OA self-archiving mandate! Stevan Harnad American Scientist Open Access Forum Tuesday, December 18. 2007Open Access Metrics: An Autocatalytic Circle of Benefits
In "Show me the data" Rossner, et al (2007, The Journal of Cell Biology, Vol. 179, No. 6, 1091-1092) wrote:
"Just as scientists would not accept the findings in a scientific paper without seeing the primary data, so should they not rely on Thomson Scientific's impact factor, which is based on hidden data. As more publication and citation data become available to the public through services like PubMed, PubMed Central, and Google Scholar®, we hope that people will begin to develop their own metrics for assessing scientific quality rather than rely on an ill-defined and manifestly unscientific number."Rossner et al are quite right, and the optimal, inevitable solution is at hand: The prospect of having Open Research Metrics for analysis and research assessment -- along with the prospect of maximizing research usage and impact through OA -- will motivate adopting the mandates, closing the autocatalytic circle of benefits from OA.(1) All research institutions and research funders will mandate that all research journal articles published by their staff must be self-archived in their Open Access (OA) Institutional Repository. Brody, T., Carr, L., Gingras, Y., Hajjem, C., Harnad, S. and Swan, A. (2007) Incentivizing the Open Access Research Web: Publication-Archiving, Data-Archiving and Scientometrics. CTWatch Quarterly 3(3). Harnad, S. (2007) Open Access Scientometrics and the UK Research Assessment Exercise. Proceedings of 11th Annual Meeting of the International Society for Scientometrics and Informetrics 11(1) : 27-33, Madrid, Spain. Torres-Salinas, D. and Moed, H. F., Eds. Shadbolt, N., Brody, T., Carr, L. and Harnad, S. (2006) The Open Research Web: A Preview of the Optimal and the Inevitable, in Jacobs, N., Eds. Open Access: Key Strategic, Technical and Economic Aspects. Chandos. Stevan Harnad American Scientist Open Access Forum Saturday, July 14. 2007Microsoft Research Faculty Summit: eScience
Microsoft Research Faculty Summit 2007
Microsoft Conference Center, Redmond, Washington, July 16 2007 eScience: Data Capture to Scholarly Publication Sunday, June 3. 2007"Academics strike back at spurious rankings"Academics strike back at spurious rankingsThis news item in Nature lists some of the (very valid) objections to the many unvalidated university rankings -- both subjective and objective -- that are in wide use today. These problems are all the more reason for extending Open Access (OA) and developing OA scientometrics, which will provide open, validatable and calibratable metrics for research, researchers, and institutions in each field -- a far richer, more sensitive, and more equitable spectrum of metrics than the few, weak and unvalidated measures available today. Some research groups that are doing relevant work on this are, in the UK: (1) our own OA scientometrics group (Les Carr, Tim Brody, Alma Swan, Stevan Harnad) at Southampton (and UQaM, Canada), and our collaborators Charles Oppenheim (Loughborough) and Arthur Sale (Tasmania); (2) Mike Thelwall (Wolverhampton); in the US: (3) Johan Bollen & Herbert van de Sompel at LANL; and in the Netherlands: (5) Henk Moed & Anthony van Raan (Leiden; cited in the Nature news item). Below are excerpts from the Nature article, followed by some references.
Isidro Aguillo is the Scientific Director of the Laboratory of Quantitative Studies of the Internet of the Centre for Scientific Information and Documentation Spanish National Research Council and editor of Cybermetrics, the International Journal of Scientometrics, Informetrics and Bibliometrics. In a posting to the American Scientist Open Access Forum, Dr. Aguillo makes the very valid point (in response to Declan Butler's Nature news article about the use of unvalidated university rankings) that web metrics provide new and potentially useful information not available elsewhere. This is certainly true, and web metrics should certainly be among the metrics that are included in the multiple regression equation that should be tested and validated in order to weight each of the candidate component metrics and to develop norms and benchmarks for reliable widespread use in ranking and evaluation. Among other potential useful sources of candidate metrics are: University MetricsBollen, Johan and Herbert Van de Sompel. Mapping the structure of science through usage. Scientometrics, 69(2), 2006 Hardy, R., Oppenheim, C., Brody, T. and Hitchcock, S. (2005) Open Access Citation Information. ECS Technical Report. Harnad, S., Carr, L., Brody, T. & Oppenheim, C. (2003) Mandated online RAE CVs Linked to University Eprint Archives: Improving the UK Research Assessment Exercise whilst making it cheaper and easier. Ariadne 35. Shadbolt, N., Brody, T., Carr, L. and Harnad, S. (2006) The Open Research Web: A Preview of the Optimal and the Inevitable, in Jacobs, N., Eds. Open Access: Key Strategic, Technical and Economic Aspects, Chandos. Harnad, S. (2007) Open Access Scientometrics and the UK Research Assessment Exercise. Invited Keynote, 11th Annual Meeting of the International Society for Scientometrics and Informetrics. Madrid, Spain, 25 June 2007 Kousha, Kayvan and Thelwall, Mike (2006) Google Scholar Citations and Google Web/URL Citations: A Multi-Discipline Exploratory Analysis. In Proceedings International Workshop on Webometrics, Informetrics and Scientometrics & Seventh COLLNET Meeting, Nancy (France). Moed, H.F. (2005). Citation Analysis in Research Evaluation. Dordrecht (Netherlands): Springer. van Raan, A. (2007) Bibliometric statistical properties of the 100 largest European universities: prevalent scaling rules in the science system. Journal of the American Society for Information Science and Technology (submitted) Stevan Harnad American Scientist Open Access Forum
« previous page
(Page 3 of 4, totaling 31 entries)
» next page
|
QuicksearchSyndicate This BlogMaterials You Are Invited To Use To Promote OA Self-Archiving:
Videos:
The American Scientist Open Access Forum has been chronicling and often directing the course of progress in providing Open Access to Universities' Peer-Reviewed Research Articles since its inception in the US in 1998 by the American Scientist, published by the Sigma Xi Society. The Forum is largely for policy-makers at universities, research institutions and research funding agencies worldwide who are interested in institutional Open Acess Provision policy. (It is not a general discussion group for serials, pricing or publishing issues: it is specifically focussed on institutional Open Acess policy.)
You can sign on to the Forum here.
ArchivesCalendar
CategoriesBlog AdministrationStatisticsLast entry: 2018-09-14 13:27
1129 entries written
238 comments have been made
Top ReferrersSyndicate This Blog |