Scientometrics

Wednesday, July 21. 2010

Why It Is Not Enough Just To Give Green OA Higher Weight Than Gold OA

MELIBEA's validator assesses OA policies using an algorithm that generates for each policy a one-dimensional measure, "OA%val," based on a number of weighted factors.

In assigning weights to these factors it is it not just a matter of whether one puts a greater weight on green than on gold overall. The devil is in the details. Since MELIBEA's "OA%val" is one-dimensional, the exact weights assigned by the algorithm matter very much, for in some crucial combinations the "score" can be deleterious to green (and hence to OA itself) by assigning any non-zero weight at all to gold in an OA policy evaluation. I will use the most problematic case to illustrate:

With all the policy components that one can combine in order to give an OA policy a score, consider the relative weighting one is to give to four policy models:

Policy Model 1 neither requires green nor funds gold (gr/go)

Policy Model 2 does not require green, but funds gold (gr/GO)

Policy Model 3 requires green and does not fund gold (GR/go)

Policy Model 4 requires green and funds gold (GR/GO).

One can agree to weight GR/GO > gr/go

One can also agree (as above) to weight GR/go > gr/GO

One can even agree to weight GR/GO > GR/go (although I do have reservations about this, because the potential deterrent effects of over-demanding early policy models on the spread of green OA mandates, but I will not bring these reservations into this discussion)

The problematic case concerns whether to assign a greater weight to gr/GO than to gr/go in the MELIBEA score (i.e., whether gr/GO > gr/go, Policy 4 vs. Policy 3).

I am strongly opposed to weighting gr/GO > gr/go, because I am convinced that when an institution adopts a premature gold payment policy without first adopting a green requirement policy, this diminishes rather than increases the likelihood of an upgrade to a green requirement.

So in that case, despite the fact that a gr/GO policy no doubt generates somewhat more OA than gr/go, this small local increase OA is not better for the growth of OA overall. Rather, it reinforces the widespread misconception that the way to generate OA is to pay for gold OA (and then wait for others to do the same). Such a policy neglects the much more important need to mandate green OA, cost-free, first. It tries to pay for OA even while subscriptions are still paying the full cost of publication, hence still tying down most of the potential funds to pay for gold OA. Giving a gr/GO policy a higher weight than gr/go obscures the fact that paid gold can only cover a small fraction of an institution's output, and at an extra cost, whereas requiring green covers all of it, and at no extra cost.

There are ways to remedy this, algorithmically (for example, by giving GO a non-zero weight only when GR also has a non-zero weight).

The important point to note, however, is that these algorithmic subtleties are not resolved by simply stating that one assigns a higher weight -- even a much higher weight -- to GR than to GO: Promoting the right priorities in OA policy design requires a much more nuanced approach.

Regarding the question of IR (institutional repository) vs CR (central repository) deposit too, the devil is in the details. Just as one more Gold OA article is indeed one more piece of OA, exactly as one more Green OA deposit is, so too one more CR deposit is indeed one more piece of OA, exactly as one more IR deposit is.

But the goal is to weight the algorithm to promote stronger policy models, not just to promote isolated increments in OA. And just as a policy that pays for gold without mandating green is generating only a little more OA at the expense of not generating a lot more OA, so a funder policy that mandates CR deposit instead of IR deposit is generating only a little more OA at the expense of not generating a lot more OA (by reinforcing -- at no cost, and with no loss in OA -- the adoption of a cooperative, convergent IR deposit policy for the rest of each institution's output, funded and unfunded, across all its discipline, instead of gratuitously competing with institutional OA policies, by adopting a divergent CR deposit policy).

The problem is not with publishers' green policies but with institutions' (and funders) lack of green policies! Over 60% of journals endorse immediate Green OA deposit for the postprint and over 40% more for the preprint (hence over 90% of all articles, overall), yet only 15% of articles are being deposited annually overall, because less than 1% of institutions have yet mandated deposit.

This is the real gap that needs to be closed -- and can be closed, immediately, by mandating Green OA. And this is what is completely overlooked by institutions and funders hurrying to pay for gold OA instead of first mandating green OA, or funders needlessly mandating CR deposit instead of IR deposit.

The fact is that there are still much fewer than even 1% mandates (about 160, out of a total of perhaps 18,000 universities plus 8,000 research institutions and at least several hundred major funders, funding across multiple institutions, worldwide). The lesson before us is hence most definitely not that mandates are not enough; it is that there are not enough mandates -- far from it.

Gold OA payment is minor matter, providing a small amount of OA, whereas green OA mandates are a major priority, able to scale up to providing 100% OA. Gold is nothing but a distraction -- for either an institution or a funder -- until and unless it first mandates green.

Nor is the problem that publishers are only paying lip-service to repository deposit. The problem is that the overwhelming majority of institutions and funders are still only paying lip service to repository deposit -- instead of mandating it.

Nor will funders and institutions pre-emptively paying publishers for gold without first mandating green (while subscriptions are still paying for publishing, tying up the potential funds to pay for gold) solve the problem of getting green mandated by institutions and funders.

For these reasons it is not enough, in evaluating OA Policy factors, just to give Green OA a higher weight than Gold OA.

Posted by Stevan Harnad in Scientometrics at 19:56 | Comments (0) | Trackbacks (0)

Monday, July 19. 2010

Ameliorating MELIBEA's Open Access Policy Evaluator

SUMMARY: The MELIBEA evaluator of Open Access policies could prove useful in shaping OA mandates -- but it still needs a good deal of work. Currently it conflates institutional and funder policies and criteria, mixes green and gold OA criteria, color-codes in an arbitrary and confusing way, and needs to validate its weights (e.g., against policy success criteria such as the percentage and growth rate of annual output deposited since the policy was adopted).

The MELIBEA Open Access policy validator is timely and promising. It has the potential to become very useful and even influential in shaping OA mandates -- but that makes it all the more important to get it right, rather than releasing MELIBEA prematurely, when it still risks increasing confusion rather than providing clarity and direction in OA policy-making.

Remedios Melero is right to point out that -- unlike the CSIC Cybermetrics Lab's 's University Rankings and Repository Rankings -- the MELIBEA policy validator is not really meant to be a ranking. Yet MELIBEA has set up its composite algorithm and its graphics to make it a ranking just the same.

It is further pointed out, correctly, that MELIBEA's policy criteria for institutions and funders are not (and should not be) the same. Yet, with the coding as well as the algorithm, they are treated the same way (and funder policy is taken to be the generic template, institutional policy merely an ill-fitting special case).

It is also pointed out, rightly, that a gold OA publishing policy is not central to institutional OA policy making -- yet there it is, contributing sizeable components to the MELIBEA algorithm.

It is also pointed out that MELIBEA's green color code has nothing to do with the "green OA" coding -- yet there it is -- red, green yellow -- competing with the widespread use of green to designate OA self-archiving, and thereby inducing confusion, both overt and covert.

MELIBEA could be a useful and natural complement to the ROARMAP registry of OA policies. I (and no doubt other OA advocates) would be more than happy to give MELIBEA feedback on every aspect of its design and rationale.

But as it is designed now, I can only agree with Steve Hitchcock's points and conclude that consulting MELIBEA today would be likely to create and compound confusion rather than helping to bring the all-important focus and direction to OA policy-making that I am sure CSIC, too, seeks, and seeks to help realize.

Here are just a few prima facie points:

(1) Since MELIBEA is not, and should not be construed as a ranking of OA policies -- especially because it includes both institutional and funder policies -- it is important not to plug it into an algorithm until and unless the algorithm has first been carefully tested, with consultation, to make sure it weights policy criteria in a way that optimizes OA progress and guides policy-makers in the right direction.

(2) For this reason, it is more important to allow users to generate separate flat lists of institutions or funders on the various policy criteria, considered and compared independently, rather than on the basis of a prematurely and arbitrarily weighted joint algorithm.

(3) This is all the more important since the data are based on less then 200 institutions, whereas the CSIC University Rankings are based on thousands. Since the population is still so small, MELIBEA risks having a disproportionate effect on initial conditions and hence direction-setting; all the more reason not to amplify noise and indirection by assigning untested initial weights without carefully thinking through and weighing the consequences.

(4) A potential internal cross-validator of some of the criteria would be a reliable measure of outcome -- but that requires much more attention to estimating the annual size and growth-rate of each repository (in terms of OA's target contents, which are full-text articles), normalized for institution size, annual total target output (an especially tricky denominator problem in the case of multi-institutional funder repositories) and the age of the policy. Policy criteria (such as request/require or immediate/delayed) should be cross-validated against these outcome measures (such as percentage and growth rate of annual target output) in determining the weights in the algorithm.

(5) The MELIBEA color coding needs to be revised -- and revised quickly, if there is to be an algorithm at all. All those arbitrary colors in the display of single repositories as ranked by the algorithm are both unnecessary and confusing, and the validator is not comprehensibly labelled. The objective should be to order and focus clearly and intuitively. Whatever is correlated with more green OA output (such as a higher level or faster growth rate in OA's target content, normalized) should be coded as darker or bigger shades of green. The same should be true for the policy criteria, separately and jointly: in each case, request/require, delayed/immediate, etc., the greenward polarity is obvious and intuitive. This should be reflected in the graphics as well as in any comparative rankings.

(6) If it includes repositories with no OA policy at all (i.e., just a repository and an open invitation to deposit) then all MELIBEA is doing is duplicating ROAR and ROARMAP, whereas its purpose, presumably, is to highlight, weigh and compare specific policy differences among (the very few) repositories that DO have policies.

(7) The sign-up data are also rather confusing; the criteria are not always consistent, relevant or applicable. The sign-up seems to be designed to make a funder-mandate the generic option, whereas this is quite the opposite of reality. There are far more institutions and institutional repositories and policies than there are funders, many of the funder criteria do not apply to institutions, and many of the institutional criteria make no sense for funders. There should be separate criterial lists for institutional policies and for funder policies; they are not the same sort of thing. There is also far too much focus and weight on gold OA policy and payment. If included at all, they should only be at the end, as an addendum, not the focus at the beginning, and on a par with green OA policy.

(8) There is also potential confusion on the matter of "waivers" or "opt-outs": There are two aspects of a mandate. One concerns whether or not deposit is required (and if so, whether that requirement can be waived) and the other concerns whether or not rights-reservation is required (and if so, whether that requirement can be waived). These two distinct and independent requirements/waivers are completely conflated in the current version of MELIBEA.

I hope there will be substantive consultation and conscientious redesign of these and other aspects of MELIBEA before it can be recommended for serious consideration and use.

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Scientometrics at 18:28 | Comments (0) | Trackbacks (0)

Thursday, July 8. 2010

Ranking Institutional Repositories

Isidro Aguillo wrote in the American Scientist Open Access Forum:

IA: "I disagree with [Hélène Bosc's] proposal [to eliminate from the top 800 institutional repository rankings the multi-institution repositories and the repositories that contain the contents of other repositories as subsets]. We are not measuring only [repository] contents but [repository] contents AND visibility [o]n the web."

Yes, you are measuring both contents and visibility, but presumably you want the difference between (1) the ranking of the top 800 repositories and (2) the ranking of the top 800 institutional repositories to be based on the fact that the latter are institutional repositories whereas the former are all repositories (central, i.e., multi-institutional, as well as institutional).

Moreover, if you list redundant repositories (some being the proper subsets of others) in the very same ranking, it seems to me the meaning of the RWWR rankings become rather vague.

IA: "Certainly HyperHAL covers the contents of all its participants, but the impact of these contents depends o[n] other factors. Probably researchers prefer to link to the paper in INRIA because of the prestige of this institution, the affiliation of the author or the marketing of their institutional repository."

All true. But perhaps the significance and usefulness of the RWWR rankings would be greater if you either changed the weight of the factors (volume of full-text content, number of links) or, alternatively, you designed the rankings so the user could select and weight the criteria on which the rankings are displayed.

Otherwise your weightings become like the "h-index" -- an a-priori combination of untested, unvalidated weights that many users may not be satisfied with, or fully informed by...

IA: "But here is a more important aspect: If I were the president of INRIA I [would] prefer people using my institutional repository instead CCSD. No problem with the [CCSD], they are [doing] a great job and increasing the reach of INRIA, but the papers deposited are a very important (the most important?) asset of INRIA."

But how heavily INRIA papers are linked, downloaded and cited is not necessarily (or even probably) a function of their direct locus!

What is important for INRIA (and all institutions) is that as much as possible of their paper output should be OA, simpliciter, so that it can be linked, downloaded, read, applied, used and cited. It is entirely secondary, for INRIA (and all institutions), where their papers are made OA, compared to the necessary condition that they are made OA (and hence freely accessible, useable, harvestable).

Hence (in my view) by far the most important ranking factor for institutional repositories is how much of their (annual) full-text institutional paper output is indeed deposited and made OA. INRIA would have no reason to be disappointed if the locus from which its content was being searched, retrieved and linked happened to be some other, multi-institutional harvester. INRIA still gets the credit and benefits from all those links, downloads and citations of INRIA content!

(Having said that, locus of deposit does matter, very much, for deposit mandates. Deposit mandates are necessary in order to generate OA content. And -- for strategic reasons that are elaborated in my own reply to Chris Armbruster -- it makes a big practical difference for success in reaching agreement on adopting a mandate in the first place that both institutional and funder mandates should require convergent institutional deposit, rather than divergent and competing institutional vs. institution-external deposit. Here too, your RWWR repository rankings would be much more helpful and informative if they gave a greater weight to the relative size of each institutional repository's content and eliminated multi-institutional repositories from the institutional repository rankings -- or at least allowed institutional repositories to be ranked independently on content vs links.

I think you are perhaps being misled here by the analogy with your sister rankings of world universities rather than just their repositories. In university rankings, the links to the university site itself matter a lot. But in repository rankings, links matter much less than how much institutional content is freely accessible at all. For the degree of usage and impact of that content, harvester sites may be more relevant measures, and, after all, downloads and citations, unlike links, carry their credits (to the authors and institutions) with them no matter where the transaction happens to occur...

IA: "Regarding the other comments we are going to correct those with mistakes but it is very difficult for us to realize that Virginia Tech University is 'faking' its institutional repository with contents authored by external scholars."

I have called Gail McMillan at Virginia Tech to inquire about this, and she has explained it to me. The question was never whether Virginia Tech was "faking"! They simply host content over and above Virginia Tech content -- for example, OA journals whose content originates from other institutions.

As such, the Virginia Tech repository, besides providing access to Virginia Tech's own content, like other institutional repositories, is also a conduit or portal for accessing the content of other institutions (e.g., those providing the articles in the OA journals Virginia Tech hosts). The "credit" for providing that conduit, goes to Virginia Tech, of course. But the credit for the links, usage and citations goes to those other institutions!

When an institutional repository is also used as a portal for other institutions, its function becomes a hybrid one -- both an aggregator and a provider. I think it's far more useful and important to try to keep those functions separate, in both the rankings and the weightings of institutional repositories.

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Scientometrics at 23:15 | Comment (1) | Trackbacks (0)

Saturday, July 3. 2010

Google Scholar Boolean Search on Citing Articles

In the world of journal articles, each article is both a "citing" item and a "cited" item. The list of references a given article cites provides that article's outgoing citations. And all the other articles in whose reference lists that article is cited provide that article's incoming citations.

Formerly, with Google Scholar (first launched in November 2004) (1) you could do a google-like boolean (and, or, not, etc.) word search, which ranked the articles that it retrieved by how highly cited they were. Then, for any individual citing article in that ranked list of citing articles, (2) you could go on to retrieve all the articles citing that individual cited article, again ranked by how highly cited they were. But you could not go on to do a boolean word search within just that set of citing articles; as of July 1 you can. (Thanks to Joseph Esposito for pointing this out on liblicense.)

Of course, Google Scholar is a potential scientometric killer-app that is just waiting to design and display powers far, far greater and richer than even these. Only two things are holding it back: (a) the sparse Open Access content of the web to date (only about 20% of articles published annually) and (b) the sleepiness of Google, in not yet realizing what a potentially rich scientometric resource and tool they have in their hands (or, rather, their harvested full-text archives).

Citebase gives a foretaste of some more of the latent power of an Open Access impact and influence engine (so does citeseerx), but even that is pale in comparison with what is still to come -- if only Green OA self-archiving mandates by the world's universities, the providers of all the missing content, hurry up and get adopted so that they can be implemented, and then all the target content for these impending marvels (not just 20% of it) can begin being reliably provided at long last.

(Elsevier's SCOPUS and Thomson-Reuters' Web of Knowledge are of course likewise standing by, ready to upgrade their services so as to point also to the OA versions of the content they index -- if only we hurry up and make it OA!)

Harnad, S. (2001) Proposed collaboration: google + open citation linking. OAI-General. June 2001.

Harnad, S. (2001) Research access, impact and assessment. Times Higher Education Supplement 1487: p. 16.

Brody, T., Kampa, S., Harnad, S., Carr, L. and Hitchcock, S. (2003) Digitometric Services for Open Archives Environments. In Proceedings of European Conference on Digital Libraries 2003, pp. 207-220, Trondheim, Norway.

Hitchcock, Steve; Woukeu, Arouna; Brody, Tim; Carr, Les; Hall, Wendy & Harnad, Stevan. (2003) Evaluating Citebase, an open access Web-based citation-ranked search and impact discovery service. ECS Technical Report, University of Southampton.

Harnad, Stevan (2003) Maximizing Research Impact by Maximizing Online Access. In: Law, Derek & Judith Andrews, Eds. Digital Libraries: Policy Planning and Practice. Ashgate Publishing 2003.

Harnad, S. (2006) Online, Continuous, Metrics-Based Research Assessment. ECS Technical Report, University of Southampton.

Brody, T., Carr, L., Harnad, S. and Swan, A. (2007) Time to Convert to Metrics. Research Fortnight pp. 17-18.

Brody, T., Carr, L., Gingras, Y., Hajjem, C., Harnad, S. and Swan, A. (2007) Incentivizing the Open Access Research Web: Publication-Archiving, Data-Archiving and Scientometrics. CTWatch Quarterly 3(3).

Harnad, S. (2008) Validating Research Performance Metrics Against Peer Rankings. Ethics in Science and Environmental Politics 8 (11) [The Use And Misuse Of Bibliometric Indices In Evaluating Scholarly Performance]

Harnad, S., Carr, L. and Gingras, Y. (2008) Maximizing Research Progress Through Open Access Mandates and Metrics. Liinc em Revista 4(2).

Harnad, S. (2009) The PostGutenberg Open Access Journal. In: Cope, B. & Phillips, A (Eds.) The Future of the Academic Journal. Chandos.

Harnad, S. (2009) Open Access Scientometrics and the UK Research Assessment Exercise. Scientometrics 79 (1)

Posted by Stevan Harnad in Scientometrics at 17:35 | Comments (0) | Trackbacks (0)

(Page 1 of 1, totaling 4 entries)

Entries from July 2010

Wednesday, July 21. 2010

Why It Is Not Enough Just To Give Green OA Higher Weight Than Gold OA

Monday, July 19. 2010

Ameliorating MELIBEA's Open Access Policy Evaluator

Thursday, July 8. 2010

Ranking Institutional Repositories

Saturday, July 3. 2010

Google Scholar Boolean Search on Citing Articles

EnablingOpenScholarship (EOS)

Federal Research Public Access Act (FRPAA)

Alliance for Taxpayer Access (ATA)

Creative Commons License:

Quicksearch

Syndicate This Blog

Materials You Are Invited To Use To Promote OA Self-Archiving:

Archives

Calendar

Categories

Blog Administration

Statistics

Top Referrers

Syndicate This Blog