SUMMARY: Imre Simon asks:
Why are
Institutional Repositories (IRs) near empty unless mandated, whereas Central Repositories (CRs) like ArXiv and CiteSeerX appear to be full without a mandate?
Here is the answer:
(1) Authors deposit papers directly in
Arxiv, whereas
CiteseerX (like
Google Scholar) is
harvested from authors' websites.
(2) The crucial factor is central vs. institutional
locus-of-deposit. Search is always at the CR level.
(3) These CRs (Arxiv for physics, Citeseerx for computer science) are fuller than IRs because: (3a) An entire discipline is bigger than an institution. (3b) The global unmandated deposit rate is
about 15% of OA's total target: all annual journal articles, across all disciplines and institutions. (3c) But deposit rate is the ratio of deposits to total output, which is much bigger for an entire discipline than a single institution. Physics and Computer Science have been depositing, one centrally, one institutionally,
unmandated, for years, but OA's problem is all the disciplines that are
not.
(4) Locus-of-deposit and mandates are closely related issues.
(5) Deposit mandates can be either funder or institutional mandates.
(6) Funder mandates only cover funded research, and not all research is funded.
(7) But all research output is institutional.
(8) So
if all institutions mandated OA, that would generate universal OA.
(9) So what is most needed is
universal institutional OA mandates.
(10) Funder mandates would help far more if they could facilitate the deposit not only of the research they fund, but
all research.
(11) To do this, funder mandates need only change one small detail. This would lose none of their funded content, but could help gain the rest of the output of each of its fundees' institutions.
(12)
Funders need to stipulate the fundee's own IR as the preferred locus-of-deposit for complying with the funder's deposit mandate.
(13) The fundees' deposits can be harvested to CRs from IRs.
(14) The issue of search and functionality at the harvester level is a red herring.
(15) The special features of the few disciplines that began spontaneously self-archiving long ago, unmandated, have nothing to do with the IR vs CR deposit-locus issue; hence unmandated CRs do not offer a viable alternative to universal IR mandates.
Imre Simon wrote (in the
American Scientist Open Access Forum):
"It is an unquestionable reality that unmandated IR's [Institutional Repositories] remain all but empty.
"[In contrast] ArXiv, CiteSeerX, Repec and SSRN are the four examples of large thematic repositories [Central Repositories, CRs] I know of which are populated without a mandate.
"One wonders why?"
The answer is highly instructive. Let me try to map it out as 15 simple points, one following from the other:
(1) There is a
profound difference between
(1a) Arxiv (and perhaps also
SSRN), which are Central Repositories [CRs] in which authors deposit papers directly, and
(1b) CiteseerX (and partly also
Repec), which are
harvested CRs, their papers and metadata being harvested from local repositories, usually at the author's host institution, where they have been directly deposited. Harvested CRs are like
OAIster -- or, for that matter,
Google Scholar!
(2) The difference is crucial, because central vs. institutional
locus-of-deposit is what is really under discussion here; no one is disputing that navigation and search are done, and should be done, at the central level, irrespective of whether CR deposit is direct or CR contents are harvested.
(3) There are several reasons why these particular CRs (Arxiv, Repec, SSRN, and the biggest of all, Citeseerx) are fuller than IRs:
(3a) An entire discipline is bigger than a single (multidisciplinary but local) institution
(3b) These CRs contain only the deposits of those individual authors and disciplines that do deposit spontaneously, unmandated; these amount to about 15% of OA's total target output, and that is well known. The problem is the remaining 85% -- which will be pretty homogeneously represented in each individual multidisciplinary institution's IR (each one 85%-empty if unmandated).
(3c) But there is a systematic denominator bias here, for the success of an IR in capturing its institutional research output needs to be reckoned as the ratio of its annual deposited papers to the total annual paper output for that institution, whereas for the success of a CR must be reckoned as the ratio of its annual deposited papers to the total annual output for the discipline or disciplines the CR covers (worldwide)! For certain disciplines and subdisciplines -- such as High Energy Physics, Astrophysics, Economics and Computer Science -- this ratio will be quite high. But those are not OA's problem disciplines, because they are depositing already, whether centrally or locally, unmandated, and have been doing so for years. OA's problem is all the disciplines that are not doing so, for those are the main basis of the 85% emptiness of IRs.
(4) The reason all this matters -- and the reason it is so important not to conflate direct-deposit CRs with harvested CRs, nor to conflate deposit locus with search locus -- is that the locus-of-deposit issue is very deeply interrelated with the issue of mandates.
(5) Deposit mandates can be funder mandates or institutional mandates.
(6) Funder mandates only cover funded research, and not all (perhaps not even most) research output is funded; moreover, this would be true even if all funders already mandated OA.
(7) In contrast, (virtually) all research output (and hence all of OA's target content) is institutional.
Institutions are the universal research providers.
(8) So
if all institutions mandated OA, that would generate universal OA.
(9) Hence if all of OA's target content is institutional output, it follows that, inasmuch as the 85% of research that is not being deposited spontaneously will be deposited once it is mandated, what is most needed is
universal institutional OA mandates.
(10) Funder mandates already help, for their subset of OA's total target content, but they would help far more if they could facilitate the deposit not only of the research they fund, but
all research: in other words, if funder mandates could help induce institutions, too, to mandate OA, for all of their own research output, not just the subset mandated by the funder.
(11) In order to be able to do this, funder mandates need only standardize one implementational detail, one that does not lose any of their own target content, but has the potential to extend the reach of the funder mandate to touch the rest of the research output of each one of its fundees' institutions.
(12) Funders need to stipulate the fundee's own IR as the preferred locus-of-deposit for complying with the funder's deposit mandate (with an interim backup repository like
DEPOT -- which was created to host deposits until the depositor's institution sets up an IR of its own, to which the DEPOT deposits can then be automatically exported: currently, DEPOT has had only
66 deposits in its nearly 2 years of existence, and that is because most UK funders are either requiring CR deposit or leaving it open which repository their choose fundees deposit in).
(13) The contents can be harvested to CRs from IRs.
(14) The issue of search and functionality at the harvester level is a red herring. (Citeseerx is a perfect example of the functionality of a CR that harvests from distributed IRs.)
(15) Nor do the special features of the few disciplines -- such as computer science, the first, and physics and economics, which took spontaneously to self-archiving long ago, without waiting for a mandate -- have anything to do with either (a) the IR/CR issue, or (b) viable alternatives to mandates, because
no one has so far demonstrated any alternatives (apart from waiting and waiting) that can generate the 85% of content missing from IRs, and from OA as a whole.
Stevan Harnad
American Scientist Open Access Forum