Simeon Warner (
Arxiv,
Cornell) wrote in
JISC-REPOSITORIES:
SW: "Lots of money is being spent on institutional repositories and, so far, the return on that investment is quite low."
Low compared to what? It is undeniable that most of the thousands of
institutional repositories are languishing near empty. The only exceptions are the fewer than a hundred
mandated ones.
But that's the point. What's needed is more mandates, not more "investment." Mandates are what will bring the return on the investment.
And there is another crucial point, constantly overlooked:
Most central repositories are languishing near-empty too! The only reason it looks otherwise is that usually a subject repository has more content than an institutional repository. But the reason for that is quite simple:
The annual worldwide output of an entire field is incomparably bigger than the annual output of any single institution. So when an institution contains no more than the usual low baseline for annual unmandated self-archiving (c. 15% of total annual research output) it has a much smaller absolute number of annual deposits than a central repository (even though that too contains only the very same low baseline 15% of the annual output in the
field as a whole, across all institutions, worldwide). (This is the "
denominator fallacy.")
Yes, I know the physics
Arxiv is an exception (with an incomparably higher unmandated central deposit rate for several of its subfields). But that's the point: Arxiv is, and has been, an exception for nearly 20 years now. No point continuing to hold our breath and hope that the longstanding spontaneous (unmandated) self-archiving practices of (some fields of) physics will be adopted by other fields. It's not happening, and 20 years is an awfully long time.
PubMedCentral (PMC) might -- and I say might, because no one has actually done the calculation -- possibly be doing better than the 15% default baseline, but that's because PMC deposit is
mandatory (by
NIH and other funders), not because PMC is
central!
(Indeed, my whole point is that the NIH and kindred biomedical self-archiving mandates would get incomparably more
bang for the buck if they mandated
institutional deposit -- and then just harvested/imported to PMC -- rather than needlessly insisting on direct central (PMC) deposit. For if NIH mandated institutional deposit, it would help stir the
Slumbering Giant -- the
universal providers of all research, funded and unfunded, in all fields, namely, the world's universities and research institutes -- into mandating deposit for all the rest of their annual research output too.
SW: "I am still optimistic that institutional repositories will become more useful but for that to happen there need to be useful worldwide (not just UK or European focused because that doesn't match research communities) disciplinary services and portals built on top them. The Catch 22 here is that disciplinary services have exactly the same funding and sustainability issues that disciplinary repositories have."
What institutional repositories need is deposit mandates, so they can have content that is worth building services on top of. It's not the potential (or the funding) for services that's missing, it's the
content (85%). And to get that content deposited, we need (convergent) institutional and funder deposit mandates.
SW: "My group manages both Cornell's eCommons institutional repository and the arXiv.org disciplinary repository. The effective cost per item [footnote 1] submitted is more than 10 times higher for the institutional repository than the disciplinary repository and the benefit/utility/visibility is lower. However, I know exactly who should and will fund eCommons (Cornell), and that nicely matches the vested interest (Cornell). The community benefit from arXiv.org is enormous and the effective cost per new item very low (<$7/item), but given 60k new items per year that is a significant cost and sustainability is a challenge."
The cost-per-item stats are funny-money.
Cornell's problem is not that it costs too much per item to deposit, it's that the
deposits are not being done, because Cornell has no mandate. That makes the ratio of IR costs to IR items unsatisfying, of course, but you are missing the real cause!
Moreover, if all institutions had mandates, the (equally small) cost per deposited item would be distributed across the planet's 10K institutions, instead of concentrated on a few central repositories (most near-empty, just like Cornell's institutional one, plus a [very] few serendipitously overstocked central ones, like Arxiv).
SW: "I think the best example of a disciplinary service over institutional repositories is RePEc in economics. This predates OAI and our current conception of IRs but fits the model: institutions (typically economics departments [footnote 2]) host articles and expose metadata/data via a standard interface. The institutionally held content is genuinely useful to the economics community because of the disciplinary services."
All true. (And note that your "best example" is a central service over distributed institutional repositories, not a central repository in which authors deposit directly!
Citeseer is another excellent example, in computer science, a field that has been self-archiving even longer than physics and economics.)
But here again, we have a community that has been self-archiving (spontaneously, and institutionally) unmandated for almost as long as Arxiv users. And again, this admirable practice has not generalized to other fields.
What physicists and economists (and computer scientists) seem to have in common is that they find the practice of publicly disseminating working papers -- unrefereed preprints -- useful and productive. That is splendid. I do too. But the majority of fields -- and hence of researchers -- do
not find publicly disseminating their unrefereed drafts useful. And you certainly cannot
mandate making authors' unrefereed drafts public; in some biomedical fields that might even be dangerous.
But you
can mandate making
refereed final drafts (published or accepted for publication) public: they are already being made public, since they're being published. So all you need to do is make it mandatory that they also be made freely accessible online (OA), so that not only subscribers can access and use them but all potential users can.
And that is what OA is about.
SW: "At the end of the day, researchers want and will use disciplinary services (look at usage stats for arXiv, ADS, SPIRES, RePEc, PMC, SSRN vs IRs). They probably don't care whether the items themselves are stored centrally or institutionally."
Correct, for
users. But users do care whether the items are
accessible at all. And that's what deposit mandates (and OA itself) are for.
And
authors do care about whether they need to do multiple deposits; and
institutions do care about whether they host their own research output.
So it does matter whether deposit is mandated institutionally or centrally, by both institutions and funders.
The difference is not in functionality, but in content. And you have no functionality if you have no content!
SW: "Some of Stevan's arguments miss key points:"
sh: "(1) Institutions are the universal providers of all research output -- funded and unfunded, across all subjects, all institutions, and all nations."
SW: "Not true, researchers are the universal providers of research output. They often work in teams that span multiple institutions and their first allegiance is often to their discipline rather than their institution."
That is (sometimes) true, but trivial. Researchers are answerable to their own institutions (employers) when it comes to the tallying of their research output for research performance assessment. (You may be more loyal to "Physics" than to Cornell U, but it is Cornell, not "Physics," that hires you, pays your salary, and evaluates your productivity; it is "for" Cornell that you "publish or perish" even if your heart belongs to "Physics.")
sh: "(3) OAI-compliant Repositories are all interoperable.
"(7) The metadata and/or full-text deposits of any OAI compliant repository can be harvested, exported or imported to any OAI compliant repository."
SW: "Interoperable to a point, and I say that as one of the creators of OAI-PMH. There is plenty of experience showing how hard it is to maintain large harvested collections and merge varying metadata (e.g. OAIster, NSDL). Institutional repositories are often managed with scant attention to maintaining interoperability, managers change the OAI-PMH base URL on a whim or do not monitor for errors. Full-text often has copyright/license issues preventing import into other repositories. "
All extremely minor (and readily remediable) points, compared to the real problem of institutional repositories, which is not that they are
errorful but that they are
EMPTY. (No point even fixing the errors while content is so impoverished. And once content is rich enough, there's the requisite motivation to clean up errors and maximize interoperability -- and services.)
sh: "(11) The solution is to fix the funder locus-of-deposit specs, not to switch to central locus of deposit."
SW: "The solution is to build disciplinary services (either on disciplinary repositories or over harvested content) that are sufficiently useful to motivate researchers to submit of their own free will."
The solution to what problem? The problem I am addressing ('lo these nigh on 20 years) is the absence of the target content over which the putative services are built.
Arxiv does not suffer from this problem -- and saints be praised for that -- b
ut that doesn't help the rest of us!
Yes, all kinds of powerful new services would be more than welcome (and will come) -- but they are useless in the absence of the content on which they are meant to operate.
And it is not researchers as
users that are the problem. It is researchers as
authors -- hence content-providers, depositors -- that are the problem. The reason they are failing to deposit is
not -- let me save you the trouble of waiting more years to find out that this is so -- because the user-services (or even the author-services) are not spiffy enough yet.
They are failing to deposit
because their fingers are "paralyzed" (for at least 34 reasons):
Harnad, S. (2006) Opening Access by Overcoming Zeno's Paralysis in Jacobs, N., Eds. Open Access: Key Strategic, Technical and Economic Aspects. Chandos.
And the cure for that paralysis is deposit mandates: "
keystroke mandates" from their institutions and funders.
And one of the (many) things holding up the universal adoption of those keystroke mandates is
funders needlessly competing with institutions for their researchers' reluctant keystrokes by mandating central deposit, hence stoking instead of soothing paralyzed authors' (rightful) resistance to the prospect of having to do divergent multiple deposit at central sites instead of convergent one-time local deposit in their own institutional repository.
SW: "(footnote 1) I think effective cost per new item is a good measure of repository cost because almost all effort beyond relatively fixed costs of keeping the system going tends to be dealing with new items. I calculate as operating budget over some period divided by number of new items in that period."
But surely you also see that the cost per item deposited depends on the overall number of items deposited!
SW: "(footnote 2) I'm pleased to say that the section of arXiv that overlaps with RePEc -- Quantitative Finance (q-fin) -- is also included in RePEc (http://ideas.repec.org/s/arx/papers.html)."
Splendid. And I wish both Arxiv and RePec all the best in taking their very useful place among (many) central collections and service-providers.
But let the one-time locus of deposit be where it belongs, and needs to be: in the researcher's own local institutional repository. And let that be the designated
convergent locus of deposit for both institutional and funder mandates.
Amen
Stevan Harnad
American Scientist Open Access Forum