On Comparing Institutional Apples With Multi-Institutional Fruit: The Denominator Fallacy Again

Thursday, July 8. 2010

On Comparing Institutional Apples With Multi-Institutional Fruit: The Denominator Fallacy Again

Chris Armbruster [CA] wrote in the American Scientist Open Access Forum:

CA: "'Institution' is indeed not a very precise concept, but the repository ranking will not be improved if one were to spend much time trying to decide which repository is institutional and which is not"

If there is any rationale for separately ranking and comparing -- as the Ranking Web of World Repositories (RWWR) does -- both the top 800 repositories and the top 800 institutional repositories (and there is indeed an important rationale for doing so), then that rationale is that the institutions are indeed institutional and not multi-institutional. The purpose is to rank their relative size (and hence their success in capturing their target content), and there is no point in comparing the size of the category "apple" with the size of the category "fruit." This is the "denominator fallacy."

The pro's and con's of Chris Armbruster's advocacy of central (multi-institutional) repositories over institutional repositories have already been multiply discussed over the years in this Forum and elsewhere.

The argument for institutional repositories is that (1) institutions are the providers of all of OA's target content, (2) they have a stake in managing their own output, and (most important of all) (3) they are in a position to mandate the deposit of their own output.

The argument for multi-institutional (central) repositories is that they look (superficially) as if they were bigger, hence more "successful" in attracting OA's target content. (Hence Chris's preference for keeping the two kinds of repositories and their sizes conflated in the RWWR rankings.) They also look (superficially) more manageable and sustainable.

The argument against multi-institutional (central) repositories is (a) that multi-institutional entities (notably, funders) cannot mandate the deposit of all institutional research output (because not all research is funded), (b) that central deposit mandates compete with instead of reinforcing institutional mandates (eliciting resistance from authors facing the prospect of having to do double-deposits), and (most relevantly here) (c) that the size and success of a repository can only be evaluated and compared in relation to the size of that repository's total target output: And although there are differences among institutions in the size of their own total output (which can and should be weighted to normalize it and make it comparable), the differences in size between institutions and multi-institutions is the difference in size between the number of apples and the number of fruit. (The denominator fallacy.)

Multi-institutional (central) repositories' content would have to be weighted by the output of all their actual and potential target institutions and the total target content of each, in order to make multi-institutional rankings comparable to those of individual institutions. RWWR is not doing that kind of weighting -- nor would it be easy to determine those weightings for each kind of multi-institutional repository, though it may eventually be possible to estimate in principle. If it were done, however, there would hardly be any need for two rankings (for repositories vs. institutional repositories).

What would be clear from a proper denominator-weighted ranking of institutional and multi-institutional repositories is that, contrary to what Chris has argued, it is not at all true that the multi-institutional repositories are bigger or more successful in collecting their respective total target contents. Rather, it makes much more sense for both institutions and funders to mandate that researchers deposit in their own institutional repository -- from which multi-institutional collections could then be automatically harvested. (It would then be redundant to try to compare their relative success, as one would clearly be a derivative of the other.)

For management and sustainability, local institutional deposit and central harvesting is the complementary -- and optimal -- solution. But first the primary content-provision problem has to be solved, otherwise there is next to nothing to manage and sustain!

CA: "how about also deleting No 10 because it is only a departmental repository?"

A departmental repository, in contrast, is sub-institutional rather than multi-institutional. Hence, unless there is to be a separate RWWR ranking of the top 800 departmental mandates, there is no harm in listing the departmental repositories among the institutional repositories -- except if the university has both an institutional and a departmental repository, and the contents of the departmental repository are also a proper subset of the contents of the institutional repository, hence double-counted.

This is not the case in the instance of ["institutional"] repository #10, University of Southampton School of Electronics and Computer Science, whose contents are not part of institutional repository #27, University of Southampton. Rather than resulting in an inflated ranking for Southampton, this actually results in a lower ranking. The joint RWWR ranking of the integrated institutional repository would be higher for Southampton. (That said, with a properly weighted denominator, separately tagged departmental repositories would be useful at this time, to compare the relative success of institution-wide mandates vs. departmental/school/faculty mandates -- i.e., Arthur's Sale's "patchwork mandate" strategy.)

CA: "Also, it is a bad idea to define repositories as institutional only if they restrict themselves to the output of a single institution. We already have too many repository managers who succumb to this kind of institutionalist logic - and reject OA content only because it is not from their own institution."

If only the problem were that of an overflowing cup, with so much OA target content that it needs to be rejected!

Chris has the OA content problem completely upside-down! The problem is that not enough of each institution's own OA target content is being deposited, anywhere -- not that institutions are declining to host the output of other institutions. (It is only Chris's central-repository preoccupation that makes him imagine that the latter is the problem.)

What's missing is not repositories to deposit in, but mandates to deposit. The solution is for institutions and funders to mandate institutional deposit of all content, funded and unfunded, across all disciplines -- and then, if desired, to harvest that content into various central collections, by discipline, funder, language or nation, as desired. Institutions are the universal providers of all that content; they are also the natural locus for deposit mandates.

CA: "The CSIC has a sound methodology for ranking repositories, and it not their job to define exclusively what is an IR and what not. And in cyberspace it is much more interesting to compare repositories according to domains and services they offer�"

I take it that by the CSIC Chris means the RWWR.

And as far as I can tell, the only reason Chris finds the methodology sound is that it conflates institutional and multi-institutional repositories, which favors Chris's preference for multi-institutional repositories.

What is much more interesting and important in cyberspace than the locus of the distributed content is the presence of the content. Most (80%) of OA's target content is still missing from anywhere on the (free) web, and long overdue. Locus matters strategically for the concrete, practical goal of capturing that target content (and making it OA). Chris keeps systematically missing this point. If the content were all there already, none of this would matter in the slightest.

(And a good intuition pump to bear in mind is that the key to the success of Google and the like was not to try to get everyone to deposit their content directly in Google: What happened, and worked, was distributed, local deposit and hosting, followed by central harvesting. Not a bad principle to generalize to OA...)

CA: "Moreover, it would help if we could move beyond the often narrow understanding of what an institutional repository is and what not & acknowledge more clearly that a strategy of privileging institutional repositories as such has not helped."

Chris does not seem to have noticed the growing institutional/departmental repository mandate movement (initiated in 2002 by Southampton ECS, but greatly accelerated since the 16th mandate in 2008 by Harvard FAS, and now running well over 100 institutional/departmental mandates, including UCL, MIT and Stanford, as well as over 40 funder mandates).

It is not (and never has been) a matter of merely "privileging" institutional deposit, but mandating it.

CA: "The value & sustainability of IRs (individually, as isolated instances, & if not embedded in a national system) is rather limited for both scholarship and open access."

(1) Repository value is nil without content.

(2) With content, locus is irrelevant, as search is not local but global, via central harvesters.

(3) Sustainability is a red herring (especially with today's sparse OA content); institutional deposit loci and central harvesters are complementary, insofar as preservation is concerned.

(4) Nations can and should mandate OA deposit. Nations can and should harvest OA deposits centrally. But there is no earthly need (or prospect) of nations directly hosting all their institutional OA output centrally, any more than there is any earthly need for nations to host all their institutions centrally.

(5) If Chris is worried about limitations on OA scholarship, he should set his mind to thinking of how to induce the OA target content providers (institutional researchers) to deposit their content, to make it OA.

(6) IRs will take care of themselves.

CA: "Hence, it is very welcome that more determined efforts are underway at building viable networks of research repositories and integrate IRs in national systems (e.g. Ireland as latest instance)."

All true, but a non sequitur, insofar was the fundamental problem of filling those repositories with their target contents is concerned.

CA: "For a sustained argument, please see": Armbruster & Romary (2010) Comparing Repository Types: Challenges and Barriers for Subject-Based Repositories, Research Repositories, National Repository Systems and Institutional Repositories in Serving Scholarly Communication (accepted for publication in IJDLS)
and Romary & Armbruster (2010) Beyond Institutional Repositories. IJDLS 1(1)44-61

For a sustained critique and response, see:

Conflating OA Repository-Content, Deposit-Locus, and Central-Service Issues

Institutional vs. Central Repositories: 2 (of 2)

Institutional vs. Central Repositories: 1 (of 2)

Beyond Romary & Armbruster On Institutional Repositories

When Will the Research Community Take OA Matters Into Its Own Hands?

First Things First: OA Self-Archiving, Then Maybe OA Publishing

Well-Meaning Supporters of "OA + X" Inadvertently Opposing OA

Swan, A., Needham, P., Probets, S., Muir, A., Oppenheim, C., O�Brien, A., Hardy, R., Rowland, F. and Brown, S. (2005) Developing a model for e-prints and open access journal content in UK further and higher education. Learned Publishing 18 (1). pp. 25-40.

I have quickly skimmed (but not read verbatim) the new A & R paper, and I see that all of my prior objections (to A & R's earlier paper) remain unanswered, indeed not even noted.

(1) The 4-way classification system -- subject, nation, "research" and institution -- continues to be arbitrary and rather incoherent.

(2) The three far more important and salient distinctions -- direct deposit repositories vs harvested collections, OA target content vs other kinds of content, mandated repositories vs. unmandated repositories -- are not treated (or not treated in enough depth to understand their salience)

(3) The all-important question of how best to capture OA's target content -- the most central question, before we even talk about repository types, services or sustainability -- is not given any serious consideration.

(4) The very specific question of locus of deposit, and its specific importance for deposit mandates (and hence for capturing the target content) is likewise not given any serious consideration.

(5) The "denominator fallacy" continues to pervade throughout, in the continued reference to absolute repository size, without taking into account the size or proportion of the repository's target contents that the repository is actually capturing. (For an institutional repository, the denominator is its total refereed journal article output; for HAL -- which A & R stunningly misclassify as the most successful of all repositories! -- it is the totality of France's refereed journal article output.)

In short, A & R's approach -- which takes so much of the current sparse and inchoate landscape for granted, and follows after it, instead of facing the real problem, which is to remedy that sparseness, and lead the way toward capturing the vast proportion of OA's target content (at least 80% of it) that is still not being captured (by any repository) -- is not, I believe, a realistic or productive one.

The reality is that most repositories -- of all the kinds A & R consider and don't consider -- are near-empty of their target content. Consequently, search, services and sustainability are not the problem: Content is.

Mandates generate the content, but A & R's treatment imagines that mandates, and their promise, amount mostly to funder mandates (and funder -- i.e. "research" repositories).

This is (in my view) an enormous error: Not all scholarly and scientific research (perhaps not even the majority of it) is funded, but virtually all of it comes from institutions -- universities and research institutes.

In and of itself, that is strong reason to give institutional repositories and institutional mandates far more serious thought than A & R give them. Another reason is that once institutional deposit is mandated and OA contents are being systematically deposited in their institutional repositories, they can be harvested to any other collections we may desire -- subject-based, national, "research" or what-have-you. Nor are the various search and other services that are built atop this OA content meant to be provided at the institutional level (where A & R note their absence as if it were a defect): services are a harvester-level function, whereas content-provision is an institution-level function.

A & R's article is also missing the point of depositing the author's rather than the publisher's version (the author's version has far fewer restrictions and can be provided much earlier); nor does it take into account the power of institutional repositories to provide immediate "Almost OA" even in the case of publisher-embargoed content, via the semi-automatic "eprint request" button. A & R also make some incorrect assumptions about the difficult and effort of deposit and the need for library assistance and proxy deposit.

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Institutional Repositories at 23:50 | Comments (0) | Trackbacks (0)

Trackbacks

Trackback specific URI for this entry

No Trackbacks

Comments

Display comments as (Linear | Threaded)

No comments

Add Comment

Name
Email
Homepage
In reply to
Comment	Enclosing asterisks marks text as bold (word), underscore are made via _word_. Standard emoticons like :-) and ;-) are converted to images. E-Mail addresses will not be displayed and will only be used for E-Mail notifications To leave a comment you must approve it via e-mail, which will be sent to your address after submission. To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly. Enter the string from the spam-prevention image above: E-Mail addresses will not be displayed and will only be used for E-Mail notifications To leave a comment you must approve it via e-mail, which will be sent to your address after submission.
	Remember Information? Subscribe to this entry
Submitted comments will be subject to moderation before being displayed.

On Comparing Institutional Apples With Multi-Institutional Fruit: The Denominator Fallacy Again

Open Access Archivangelism

Thursday, July 8. 2010

On Comparing Institutional Apples With Multi-Institutional Fruit: The Denominator Fallacy Again

EnablingOpenScholarship (EOS)

Federal Research Public Access Act (FRPAA)

Alliance for Taxpayer Access (ATA)

Creative Commons License:

Quicksearch

Syndicate This Blog

Materials You Are Invited To Use To Promote OA Self-Archiving:

Archives

Calendar

Categories

Blog Administration

Statistics

Top Referrers

Syndicate This Blog