Critique of: Romary, L & Armbruster, C. (2009) Beyond Institutional Repositories.
R&A: "The current system of so-called institutional repositories, even if it has been a sensible response at an earlier stage, may not answer the needs of the scholarly community, scientific communication and accompanied stakeholders in a sustainable way."
Almost all institutional repositories today are near-empty. Until and unless they are successfully filled with their target content, talk about their "answering needs" or being made "sustainable" is
moot.
The primary target content of both the Open Access movement and the Institutional (and Central) Repository movement is refereed research: the 2.5 million articles per year published in the planet's 25,000 peer-reviewed journals. (That is why R&C speak, rather ambiguously, about "Publication Repositories.")
Institutions are the
universal providers of all that refereed research output, funded and unfunded, in all scholarly and scientific disciplines, worldwide.
Institutions have a fundamental interest in hosting, inventorying, monitoring, managing, assessing, and showcasing their own research output, as well as in maximizing its uptake, usage and impact.
Yet not only is most of the research output of most institutions failing to be deposited in the institution's own repository:
most of it is not being deposited in any other repository either. (Please keep this crucial fact in mind as you reflect on the critique below.)
R&A: "[H]aving a robust repository infrastructure is essential to academic work."
A repository, be its "infrastructure" as "robust" as you like, is of no use for academic work as long as it is near-empty.
R&A: "[C]urrent institutional solutions, even when networked in a country or across Europe, have largely failed to deliver."
Largely empty repositories, "networked" to largely empty repositories remain doomed to deliver next to nothing.
R&A: "Consequently, a new path for a more robust infrastructure and larger repositories is explored to create superior services that support the academy."
Making largely empty repositories "larger" (by "networking" them) is as futile as "making their infrastructure more robust":
What repositories lack and need is their target content.
The reason most repositories are near-empty is that most researchers are not depositing in them.
And the reasons most researchers are not depositing are multiple (there are
at least 34 of them), but they boil down to one basic reason, and researchers have already indicated, clearly, in international surveys, what that one basic reason is:
Deposit has not been mandated (by their institutions or their funders).
Ninety-five percent of researchers surveyed across all disciplines, worldwide, most of whom do not deposit, respond that
they would deposit if deposit were mandated, 14% of them reluctantly, and
81% of them willingly. (
Swan)
And outcome studies have shown that researchers do what they said they would do: When deposit is mandated, they do indeed deposit, in high proportions, within two years of adoption of the deposit mandate. (
Sale)
Hence what institutions need in order to induce their researchers to deposit is not larger or more robust repositories, but deposit mandates.
The number of
mandates is growing, but there are still as yet only 90 of them worldwide.
Hence what is urgently needed to fill repositories so they can begin providing "superior services" for the academy is more mandates, not larger repositories or "more robust infrastructure."
R&A: "[F]uture organisation of publication repositories is advocated that is based upon macroscopic academic settings providing a critical mass of interest as well as organisational coherence."
The only "critical mass" that repositories need is their missing target OA content.
Researchers have an intrinsic interest in making their research output OA. Institutions have an interest in making their research OA. Funders have an interest in making their research output OA. And the tax-paying public has an interest in making the research they fund OA.
In contrast, subscription/license publishers do not have an intrinsic interest in making the research they publish OA except if they are paid for it (via
Gold OA publication fees). Publishers view
Green OA (via repository deposit) as putting their subscription and license revenues at risk. They haven't much choice but to
endorse deposit by their authors, given the research benefits of OA, and particularly when it is mandated by their authors' institutions and funders; but publishers themselves certainly have no need or desire to do the depositing on their authors' behalf, for free.
The way to see this clearly is to realize that Green OA amounts to repository deposit by authors, for free, whereas Gold OA amounts to "repository deposit" by publishers, for a fee.
Most publishers are not depositing today because they are not being paid to do it.
Most authors are not depositing today because they are not being mandated to do it.
There is no solution in "amalgamating" these respective empty repositories (unmandated Green and unpaid Gold). The solution is either more mandates or more money.
As subscriptions/licenses are covering the costs of publication today, there is neither the need to pay for Gold OA,
pre-emptively, today nor the extra money to pay for it: The potential money is tied up in paying the subscription/license fees that are already covering the costs of publication.
Mandates do not depend on publishers but on institutions and funders; nor do mandates bind publishers: they bind only authors. It is hence incoherent to imagine macro-repositories fed by both authors and publishers. Nor is it necessary, since institutional (and funder) deposit mandates, along with institutional repositories are jointly necessary and sufficient to achieve 100% OA.
R&A: "Such a macro-unit may be geographical (a coherent national scheme), institutional (a large research organisation or a consortium thereof) or thematic (a specific research field organising itself in the domain of publication repositories)."
"Macro" organisations -- whether institutional consortia, national consortia or disciplinary consortia -- do not resolve this fundamental contradiction between free access and any scheme to pay for access.
(In principle, McDonalds and Burger King could give free access to hamburgers if a global consortium of some sort were to agree to bankroll it all up-front; however, that would hardly be free access: it would simply be global acquiescence to a global oligopoly on the sale of a product.)
So forget about counting on publishers to deposit articles in OA repositories -- whether institutional or central -- unless they are paid up-front to do so. And paying them to do so via licenses is not "organisational coherence" but what biologists would call an "
evolutionarily unstable strategy," doomed to collapse because of its own intrinsic instability.
It is the articles' authors who need to deposit, and it is that deposit that their institutions and funders need to mandate.
R&A: "The argument proceeds as follows: firstly, while institutional open access mandates have brought some content into open access, the important mandates are those of the funders"
This "argument" is demonstrably incorrect.
Not all or even most of research is funded, whereas all research originates from institutions. Hence institutional mandates cover
all research, whereas funder mandates cover only
funded research.
The
NIH,
RCUK and
ERC funder mandates were indeed important because they set an example for other funders to follow (and many are indeed following); but that still only covers funded research. Funder mandates do not scale up to cover all research.
The
Harvard,
Stanford and
MIT institutional mandates were hence far more important, because they set an example for other institutions to follow (and many are indeed following); and this does cover all research output, because institutions are the universal providers of all research output, whether funded or unfunded, across all disciplines.
R&A: "[Funder mandates] are best supported by a single infrastructure and large repositories, which incidentally enhances the value of the collection (while a transfer to institutional repositories would diminish the value)."
This is again profoundly incorrect. The only "value enhancement" that empty collections need is their missing content. (Nor are we talking about "transfer" yet, since the target contents are not being deposited. We are talking about mandating deposit.)
Funder mandates can be fulfilled just as readily by depositing in institutional repositories or central ones. Repository size and locus of deposit are completely irrelevant. All OAI-compliant repositories are interoperable. The OAI-PMH allows central harvesting from distributed repositories. In addition, transfer protocols like SWORD allow direct, automatic repository-to-repository transfer of contents.
Hence there is no functional advantage whatsoever to direct central deposit, since central harvesting from institutional repositories achieves exactly the same functional result. Instead, direct central deposit mandates have the great disadvantage that they compete with institutional mandates instead of facilitating them.
Both the natural and the
optimal locus of deposit is the institutional repository, for both institutions and funders. That way funder mandates and institutional mandates collaborate and converge, covering all research output.
Summary:
(1) Repository size and "infrastructure" do not generate content.
(2) Empty repositories are useless.
(3) The only way to fill them is to mandate deposit.
(4) Not all or most research is funded.
(5) But all research originates from institutions.
(6) Institutions' interests are served by hosting and managing their own research assets.
(7) Hence both institutional and funder mandates should converge on institutional deposit.
(8) Any central collections can then be harvested from the global distributed of institutional repositories.
And now an important correction of a widespread misinterpretation of the relative success of institutional and central repositories in capturing their target content:
The Denominator Fallacy. With one prominent exception -- which has absolutely nothing to do with the fact that the exceptional repository in question, the physics Arxiv, happens to be central rather than institutional --
unmandated central repositories (and there are many) are no more successful in getting themselves filled with their target content than unmandated institutional repositories. The critical causal variable is the mandate, not the repository's centrality or size.
The way to arrive at a clear understanding of this fundamental fact is to note that the denominator -- i.e., the total target content relative to which we are trying to reckon, for a given repository, what proportion of it is being deposited -- is
far bigger for a central disciplinary repository than for an institutional repository.
For an institutional repository, its denominator is the total number of refereed journal articles, across all disciplines, produced by that institution annually.
For a central disciplinary repository, its denominator is the total number of refereed journal articles, across all institutions worldwide, published in that discipline annually. (For a national repository, like
HAL, its denominator is the total research output of all the nation's institutions, across all disciplines.)
So it is no wonder that central repositories are "larger" than institutional ones: Their total target content is much larger. But this difference in absolute size is not only irrelevant but deeply misleading. For the proportion of their total annual target content that unmandated central repositories are actually capturing is every bit as minuscule as the proportion that unmandated institutional repositories are capturing. And whereas the total size of a mandated institutional repository remains much smaller than an unmandated central repository, the reality is that
the mandated institutional repositories are capturing (or near capturing) their total target outputs, whereas the unmandated central repositories are far from capturing theirs.
The reason Arxiv is a special case is not at all because it is a central repository but because the physicists that immediately began depositing in Arxiv way back in 1991, with no need whatsoever of a mandate to impel them to do so, had already long been doing much the same thing in paper (at the CERN and SLAC paper depositories), and necessarily centrally, because in the paper medium there is no way one can send one's paper to "everyone," nor to get everyone to access or "harvest" each new paper from each author's own institutional depository (if there had been such a thing).
All of that is over now. And if physicists had made the transition from paper preprint deposit to online preprint deposit directly today rather than in 1991, in the OAI-MPH era of repository interoperability and harvesting, there is no doubt that they would have deposited in their own respective institutional repositories and CERN and SLAC and Arxiv would simply have harvested the metadata automatically from there (with the obvious computational alerting mechanisms set up for harvesting, export and import).
But that longstanding cultural practice of preprint deposit among physicists would be just as anomalous if physicists had begun it all by depositing institutionally rather than centrally, for no other (unmandated) central repository (or discipline) is capturing the high portion of its annual total target content that the physics Arxiv is capturing (in certain preprint-sharing subfields of physics) and has been capturing ever since since 1991, in the absence of any deposit mandate.
So the centrality, size and success of Arxiv is completely irrelevant to the problem of how to fill all other unmandated repositories, whether central or institutional, large or small, in any other discipline, and regardless of the "robustness" of the repository's "infrastructure." Only the mandated repositories are successfully capturing their target content, and there is no longer any need to deposit directly in central repositories: In the OAI-compliant OA era, central "repositories" need only be collections, harvested from the distributed local repositories of the universal research providers: the institutions.
R&A: "Secondly, we compare and contrast a system based on central research publication repositories with the notion of a network of institutional repositories to illustrate that across central dimensions of any repository solution the institutional model is more cumbersome and less likely to achieve a high level of service."
The assumption is made here -- with absolutely no supporting evidence, and with all existing evidence (other than the single special case of Arxiv, discussed above) flatly contradicting it -- that researchers are more likely to deposit their refereed journal articles in big central repositories than in their own institutional repositories.
All evidence is that researchers are equally
unlikely to deposit in either kind of repository unless deposit is mandated, in which case it makes no difference whether the repository is institutional or central -- except that if both funders and institutions mandate institutional deposit then their mandates converge and reinforce one another, whereas if funders mandate central deposit and institutions mandate institutional deposit then their mandates diverge and compete with one another.
(And of course the natural direction for harvesting is from
local to central, not vice versa: We all deposit on our institutional websites and google harvests from there; it would be absurd for everyone to deposit in google and then back-harvest to their own institutional website. The same is true for any central OAI harvesting service.)
R&A: "Next, three key functions of publication repositories are reconsidered, namely a) the fast and wide dissemination of results; b) the preservation of the record; and c) digital curation for dissemination and preservation."
Again, these functions in no way distinguish central and institutional repositories (both can and do provide them) and have no bearing whatsoever on the real problem, which is the absence of the target content -- for which the remedy is to mandate deposit. Otherwise there's nothing to curate, preserve and disseminate.
R&A: "Fourth, repositories and their ecologies are explored with the overriding aim of enhancing content and enhancing usage."
You cannot enhance content if the content is not there. And you cannot enhance the usage of absent content. Hence it is it not enhancements that are needed but deposit mandates to generate the nonexistent content for which all these enhancements are being contemplated...
R&A: "Fifth, a target scheme is sketched, including some examples."
The target scheme includes a suggestion that publishers should do the depositing, of their own proprietary version of the refereed article. This is perhaps the worst suggestion of all. Just when institutions are at last realizing that after decades of outsourcing it to publishers, they can now host and manage their own research output by mandating that their researchers deposit their final refereed drafts in their own institutional repositories, Romary & Armbruster instead suggest "consolidated" central "publication repositories" in which publishers do the depositing. (The question to contemplate is: If it requires a mandate to induce researchers to deposit, what will it require to induce publishers to deposit -- other than paying them to do it? And if so, who will pay how much for what, out of what money -- and why?)
Most of the rest of R&C's suggestions are superfluous, and fail completely to address the real problem: the absence of OA's target content. You can't go "beyond" institutional repositories until you first succeed in filling them.
Stevan Harnad
American Scientist Open Access Forum