Open Access Archivangelism

Sunday, August 26. 2007

Validating Open Access Metrics for RAE 2008

SUMMARY: The United Kingdom's Research Assessment Exercise (RAE) has two pluses and two (correctable) minuses:
(+1) It is a good idea to have a national research performance evaluation to monitor and reward research productivity and progress.
(+2) It is also a good idea to convert the costly, time-consuming, wasteful (and potentially biased) panel-based RAE of past years into an efficient, unbiased metric-based RAE, using objective measures that can be submitted automatically online, with the panel's role now only being to monitor and fine-tune the resulting rankings.
(-1) The biggest flaw concerns the metrics that will be used. Metrics first have to be tested and validated, discipline by discipline, to ensure that they are accurate indicators of research performance. Since the UK has relied on the RAE panel evaluations for two decades, and since the last RAE (2008) before conversion to metrics is to be a parallel panel/metrics exercise, the natural thing to do is to test as many candidate metrics as possible in this exercise, and to cross-validate them against the rankings given by the panels, separately, in each discipline. The prior-funding metric needs to be used cautiously, to avoid bias and self-fulfilling prophecy; and the citation-count metric is a good candidate, but only one of many potential metrics that can and should be tested in the parallel RAE 2008 metric/panel exercise. Other metrics include co-citation counts, download counts, download and citation growth and longevity counts, hub/authority scores, interdisciplinarity scores.
(-2) RAE 2008 is needlessly insisting that researchers submit the publishers' PDFs for the 2008 exercise. It should instead require researchers to deposit their own peer-reviewed, revised, accepted final drafts in their own University's Institutional Repositories (IRs) for research assessment, where RAE can access them directly. This will not only provide the research database for assessment, but it will also help to accelerate the growth and benefits of Open Access in the UK and worldwide.
There is still time to fully remedy (-1) and (-2).

The United Kingdom's Research Assessment Exercise (RAE) is doing two things right. There are also two things it is planning to do that are currently problematic, but that could easily be made right. Let's start with what RAE is already doing right:

(+1) It is a good idea to have a national research performance evaluation to monitor and reward research productivity and progress. Other countries will be following and eventually emulating the UK's lead. (Australia is already emulating it.)

(+2) It is also a good idea to convert the costly, time-consuming, wasteful (and potentially biased) panel-based RAE of past years into an efficient, unbiased metric-based RAE, using objective measures that can be submitted automatically online, with the panel's role now only being to monitor and fine-tune the resulting rankings. This way the RAE will no longer take UK researchers' precious time away from actually doing UK research in order to resubmit and locally "re-peer-review" work that has already been submitted, published and peer-reviewed, in national and international scholarly and scientific journals.

But, as with all policies that are being shaped collectively by disparate (and sometimes under-informed) policy-making bodies, two very simple and remediable flaws in the reformed RAE system have gone detected and hence uncorrected. They can still be corrected, and there is still hope that they will be, as they are small, easily fixed flaws; but, if left unfixed, they will have negative consequences, compromising the RAE as well as the RAE reforms:

(-1) The biggest flaw concerns the metrics that will be used. Metrics first have to be tested and validated, discipline by discipline, to ensure that they are accurate indicators of research performance. Since the UK has relied on the RAE panel evaluations for two decades, and since the last RAE (2008) before conversion to metrics is to be a parallel panel/metrics exercise, the natural thing to do is to test as many candidate metrics as possible in this exercise, and to cross-validate them against the rankings given by the panels, separately, in each discipline. (Which metrics are valid performance indicators will differ from discipline to discipline.)

All indications so far are that this cross-validation exercise is not what RAE 2008 and HEFCE are planning to do. Instead, there is a focus on a few pre-selected metrics, rather than the very rich spectrum of potential metrics that could be tested. The two main pre-selected metrics are (-1a) prior research funding and (-1b) citation counts.

(-1a) Prior research funding has already been shown to be extremely highly correlated with the RAE panel rankings in a few (mainly scientific) disciplines, but this was undoubtedly because the panels, in making their rankings, already had those metrics in hand, as part of the submission. Hence the panels themselves could explicitly (or implicitly) count them in making their judgments! Now a correlation between metrics and panel rankings is desirable initially, because that is the way to launch and validate the candidate metrics. In the case of this particular metric, however, not only is there a potential interaction, indeed a bias, that makes the prior-funding metric and the panel ranking non-independent, and hence invalidates the test of this metric's validity, but there is also a deeper reason for not putting a lot of weight on the prior-funding metric:

The UK has a Dual Support System for research funding: (I) Competitive Individual Researcher Project Proposals (RCUK) and (II) the RAE panel rankings (awarding top-sliced research funding to University Departments, based on their research performance). The prior-funding metric is determined largely by (I). If it is then also given a heavy weight in (II), then that is not improving the RAE [i.e., (II)]: it is merely collapsing the UK's Dual System into (I) alone, thereby doing away with the RAE altogether. As if this were not bad enough, the prior-funding metric is not even a valid metric in many of the RAE disciplines.

(-1b) Citations counts are a much better potential candidate metric. Indeed, in many of the RAE disciplines, citation counts have already been tested and shown to be correlated with the panel rankings, although not nearly as highly correlated as prior funding (in those few disciplines where prior funding is indeed highly correlated). This somewhat weaker correlation in the case of the citation metric is a good thing, because it leaves room for other metrics to contribute to the assessment outcome too, making the joint outcome more accurate, balanced and congruent with each discipline's profile. It is unlikely, and undesirable, to expect performance evaluation to be based on one single metric. Citation counts, however, are certainly a strong candidate for serving as a particularly important one among the array of multiple metrics to be validated and used in future RAEs. Citation counts also have the virtue that they were not explicitly available to the RAE panels when they made their rankings (indeed, it was explicitly forbidden to submit or count citations). So their already-confirmed correlation with the RAE panel rankings is a genuine empirical correlation rather than an explicit bias.

Hence the prior-funding metric (-1a) needs to be used cautiously, to avoid bias and self-fulfilling prophecy; and the citation-count metric (-2b) is a good candidate, but only one of many potential metrics that can and should be tested in the parallel RAE 2008 metric/panel exercise. (Other metrics include co-citation counts, download counts, download and citation growth and longevity counts, hub/authority scores, interdisciplinarity scores, and many other rich measures for which RAE 2008 is the ideal time to do the testing and validation, discipline by disciplines, as it is virtually certain that disciplines will differ in which metrics are predictive for them, and what the weightings of each metric should be.) Yet it looks as if RAE 2008 and HEFCE are not currently planning to commission this all-important validation analysis, testing metrics against panel rankings for a rich array of candidate metrics. This is a huge flaw and oversight, although it can still be easily remedied by going ahead and doing such a systematic cross-validation study after all.

For such a systematic metric/panel cross-validation study in RAE 2008, however, the array of candidate metrics has to be made as rich and diverse as possible. The RAE is not currently making any effort to collect as many potential metrics as possible in RAE 2008, and this is partly because it is overlooking the growing importance of online, Open Access metrics -- and indeed overlooking the growing importance of Open Access itself, both in research productivity and progress itself, and in evaluating it.

Brody, T., Carr, L., Gingras, Y., Hajjem, C., Harnad, S. and Swan, A. (2007) Incentivizing the Open Access Research Web: Publication-Archiving, Data-Archiving and Scientometrics. CTWatch Quarterly 3(3).

Harnad, S. (2007) Open Access Scientometrics and the UK Research Assessment Exercise. In: Proceedings of 11th Annual Meeting of the International Society for Scientometrics and Informetrics 11(1), pp. 27-33, Madrid, Spain. Torres-Salinas, D. and Moed, H. F., Eds.

Shadbolt, N., Brody, T., Carr, L. and Harnad, S. (2006) The Open Research Web: A Preview of the Optimal and the Inevitable, in Jacobs, N., Eds. Open Access: Key Strategic, Technical and Economic Aspects. Chandos.

This brings us to the second flaw in HEFCE's RAE 2008 plans:

(-2) For no logical or defensible reason at all, RAE 2008 is insisting that researchers submit the publishers' PDFs for the 2008 exercise. Now it does represent some progress that the RAE is accepting electronic drafts rather than requiring hard copy, as in past years. But in insisting that those electronic drafts must be the publisher's PDF, the RAE is creating two unnecessary problems.
(-2a) One unnecessary problem, a minor one, is that the RAE imagines that in order to have the publisher's PDF for evaluation, they need to seek (or even pay for) permission from the publisher. This is complete nonsense! Researchers (i.e., the authors) submit their own published work to the RAE for evaluation. For the researchers, this is Fair Dealing (Fair Use) and no publisher permission or payment whatsoever is needed. (As it happens, I believe HEFCE has worked out a "special arrangement" whereby publishers "grant permission" and "waive payment." But the completely incorrect notion that permission or payment were even at issue, in principle, has an important negative consequence, which I will now describe.)

What HEFCE should have done -- instead of mistakenly imagining that it needed permission to access the papers of UK researchers for research evaluation -- was to require researchers to deposit their own peer-reviewed, revised, accepted final drafts in their own University's Institutional Repositories (IRs) for research assessment. The HEFCE panels could then access them directly in the IRs for evaluation.

This would have ensured that all UK research output was deposited in each UK researcher's university IR. There is no publisher permission issue for the RAE: The deposits can, if desired, be made Closed Access rather than Open Access, so only the author, the employer and the RAE panels can access the full text of the deposit. Merely depositing an author's own papers internally and for research evaluation is Fair Dealing and requires absolutely no permission from anyone.

(-2b) But, as a bonus, requiring the deposit of all UK research output (or even just the four "best papers" that are currently the arbitrary limit for RAE submissions) into the researcher's IR for RAE evaluation would have ensured that 62% of those papers could immediately have been made OA (because 62% of journals already endorse immediate OA self-archiving). And for the remaining 38% this would have allowed each IR's "Fair Use" Button to be used by researchers webwide to request an individual email copy semi-automatically (with those "eprint requests" providing a further potential metric, alongside download counts).

Instead, HEFCE needlessly insisted on the publisher's PDF (which, by the way, could likewise have been deposited by all authors in their IRs, as Closed Access, without needing any permission from their publishers) being submitted to RAE directly. This effectively cut off not only a rich potential source of RAE metrics, but a powerful incentive for providing OA, which has been shown, in itself, to increase downloads and citations directly in all disciplines so far tested.

To recapitulate: two pluses -- (+1) research performance itself, and (+2) conversion to metrics -- plus two (correctable) minuses -- (-1) failure to explicitly provide for the systematic evaluation of a rich candidate spectrum of metrics against the RAE 2008 panel rankings and (-2) failure to require deposit of the authors' papers in their own IRs, to generate more OA metrics, more OA, and more UK research impact.

The good news is that there is still time to fully remedy (-1) and (-2), if only policy-makers take a moment to listen, think it through, and do the little that needs to be done to fix it.

Appendix: Are Panel Rankings Face-Valid?

It is important to allay a potential misunderstanding: It is definitely not the case that the RAE panel rankings are themselves infallible or face-valid! The panelists are potentially biased in many ways. And RAE panel review was never really "peer review," because peer review means consulting the most qualified specialists in the world for each specific paper, whereas the panels are just generic UK panels, evaluating all the UK papers in their discipline: It is the journals who already conducted the peer review.

So metrics are not just needed to put an end to the waste and the cost of the existing RAE, but also to try to put the outcome on a more reliable, objective, valid and equitable basis. The idea is not to duplicate the outcome of the panels, but to improve it.

Nevertheless -- and this is the critical point -- the metrics do have to be validated; and, as an essential first step, they have to be cross-validated against the panel rankings, discipline by discipline. For even though those panel rankings are and always were flawed, they are what the RAE has been relying upon, completely, for two decades.

So the first step is to make sure that the metrics are chosen and weighted so as to get as close a fit to the panel rankings as possible, discipline by discipline. Then, and only then, can the "ladder" of the panel-rankings -- which got us where we are -- be tossed away, allowing us to rely on the metrics alone -- which can then be continuously calibrated and optimised in future years, with feedback from future meta-panels that are monitoring the rankings generated by the metrics and, if necessary, adjusting and fine-tuning the metric weights or even adding new, still to be discovered and tested metrics to them.

In sum: despite their warts, the current RAE panel rankings need to be used to bootstrap the new metrics into usability. Without that prior validation based on what has been used until now, the metrics are just hanging from a skyhook and no one can say whether or not they measure what the RAE panels have been measuring until now. Without validation, there is no continuity in the RAE and it is not really a "conversion" to metrics, but simply an abrupt switch to another, untested assessment tool.

(Citation counts have been tested elsewhere, in other fields, but as there has never been anything of the scope and scale of the UK RAE, across all disciplines in an entire country's research output, the prior patchwork testing of citation counts as research performance indicators is nowhere near providing the evidence that would be needed to make a reliable, valid choice of metrics for the UK RAE: only cross-validation within the RAE parallel metric/panel exercise itself -- jointly with a rich spectrum of other candidate metrics -- can provide that kind of evidence, and the requisite continuity, for a smooth, rational transition from panel rankings to metrics.)

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Research Assessment at 14:10 | Comments (3) | Trackbacks (0)

(Page 1 of 1, totaling 1 entries)

Entries from Sunday, August 26. 2007

Sunday, August 26. 2007

Validating Open Access Metrics for RAE 2008

EnablingOpenScholarship (EOS)

Federal Research Public Access Act (FRPAA)

Alliance for Taxpayer Access (ATA)

Creative Commons License:

Quicksearch

Syndicate This Blog

Materials You Are Invited To Use To Promote OA Self-Archiving:

Archives

Calendar

Categories

Blog Administration

Statistics

Top Referrers

Syndicate This Blog