Here is the video of my presentation to the
DRIVER Summit:
Institutional Versus Central Deposit:
Optimising DRIVER Policy for the OA Mandate and Metric Era
Also to be discussed at the DRIVER Summit is this statement by the EU Council (
not to be confused with the European Research Council (ERC), which has mandated OA self-archiving!) The EU Council's Conclusions show the tell-tale signs of penetration by the publisher anti-OA lobby; familiar slogans, decisively
rebutted many, many times before, crop up verbatim in the EU Council's language, though the Council does not appear to realize that it has allowed itself to become the mouthpiece of these special interests, which are not those of the research community:
Council of the European Union: Conclusions on scientific information in the digital age: access, dissemination and preservation
Here is my critique of this EU Council statement (all boldface quotes are from the Council's statement, the underscores have been added):
"the importance of scientific output resulting from publicly funded research being available on the Internet at no cost to the reader under economically viable circumstances, including delayed open access"
(1) 'At no cost to the reader' conflates site-licensing and Open Access (OA). This wording was no doubt urged by the publisher lobby. The focus should be on providing
free online access webwide. That is OA, and that makes the objective clear and coherent.
(2) 'Delayed open access' refers to publisher embargoes on author self-archiving. If embargoes are to be accommodated, it should be made clear that they apply to the date at which the access to the embargoed document is made OA,
not to the date at which the document is deposited, which should be immediately upon acceptance for publication. The DRIVER network of Institutional Repositories (IRs) can then adopt the '
email eprint request' button that will allow individual users to request and receive individual copies of the document semi-automatically.
(3) What should be deposited in the author's own institutional IR immediately upon acceptance for publication is the author's peer-reviewed, accepted final draft ('postprint'),
not the publisher's PDF (or XML). There are far more publisher embargoes on the PDF/XML than on the postprint, and the postprint is all that is needed for research use and progress. The postprint is a supplementary version of the official publication, provided for OA purposes; it is not the version with the primary digital preservation problem.
(4) Digital preservation should not be conflated with OA provision: There is a (separate) problem of the digital preservation of the publisher's PDF/XML, but this is not the same as the problem of providing OA to the author's postprint. The postprint, though it can and should be preserved, is not the canonical copy of the publication, so the two preservation tasks should not be conflated.
(5) Self-archiving research data is also a different matter from self-archiving research publications. Data-archiving is not subject to a publisher embargo, and it needs independent preservation, but
data-access and data-preservation should not be conflated with OA provision.
(6) Deposit should be directly in each author's own IR: Distributed institutional depositing and storage should not be conflated with central harvesting and indexing:
Deposit Institutionally, Harvest Centrally.
(7) Direct central deposit should be avoided except in cases where the author is institutionally unaffiliated or the author's institution does not yet have an IR. For those cases, there should be at least one provisional default repository such as
DEPOT.
(8) Research (publications and data) should not be conflated with other forms of digital content. The problems of cultural heritage archiving, for example, are not the same as those of research publication archiving. Nor are the problems of archiving the same as the problem of access-provision (OA).
"ensure the long term preservation of scientific information -including publications and data"
This is an example of the complete conflation of OA-provision with digital preservation, including a conflation of authors' supplementary postprints with the publisher's original, as well as a conflation of research publications with research data.
DRIVER will not have a coherent programme unless it clearly and systematically de-conflates OA-provision from digital preservation, primary publications from authors' supplementary postprints, and publication-archiving from data-archiving, treating each of these separately, on its own respective terms.
"experiments on and wide deployment of scientific data infrastructures with cross-border, cross-institution and cross-discipline added-value for open access to and preservation of scientific information"
This again conflates OA provision with digital preservation and conflates publications with data. It also conflates both of these with IR interoperability, which is yet another matter. (And webwide OA is, by definition, cross-institution, cross-border and cross-discipline, so that is a non-issue.)
What is an issue, however, is
institutional versus central depositing, and it is crucial that DRIVER have a clear, coherent policy (insofar as research archiving is concerned -- this does not necessarily apply to other forms of digital content):
Deposit Institutionally: Harvest/Index/Search Centrally.
The emphasis of DRIVER should accordingly be on ensuring that the distributed IRs have the requisite interoperability for whatever central harvesting, indexing, search and analysis are needed and desired.
"promoting, through these policies, access through the internet to the results of publicly financed research, at no cost to the reader, taking into consideration economically sustainable ways of doing this, including delayed open access"
Economic sustainability is again a red herring introduced by the publishing lobby into language that should only concern the research community and research access. The economic sustainability of publishing is not DRIVER's concern.
DRIVER's concern should be interoperable OA-provision (plus whatever cultural-heritage and other forms of archiving DRIVER wishes to provide the infrastructure for).
Nor are publisher access-embargoes DRIVER's concern: DRIVER should merely help ensure immediate deposit in IRs, and it should facilitate research usage needs through IR interoperability as well as the IRs'
email eprint request button.
"2008 working towards the interoperability of national repositories of scientific information in order to facilitate accessibility and searchability of scientific information beyond national borders"
Insofar as research is concerned, it is not the interoperability of national repositories that is crucial but the
interoperability of all OA IRs.
"2009 contributing to an effective overview of progress at European level, informing the Commission of results and experiences with alternative models for the dissemination of scientific information."
This is again a red herring (for both the EU and for DRIVER) introduced by the publishing lobby: Research archiving and OA-provision are neither a matter of alternative publishing models nor a matter of alternatives to the generic peer-reviewed publication model.
Publishing reform and peer review reform are not DRIVER matters. They can and will evolve too, but DRIVER should focus on the deposit of current published research as well as research data in IRs, and the interoperability of those IRs. That is the immediate problem. The rest is merely speculative for now.
"B. Invitation to the Commission to implement the measures announced in the Communication on "scientific information in the digital age: access, dissemination and preservation", and in particular to: 1. Experiment with open access to scientific publications resulting from projects funded by the EU Research Framework Programmes by: defining and implementing concrete experiments with open access to scientific publications resulting from Community funded research, including with open access."
This is a vague way of saying that the publishing lobby has persuaded the EU not to do the obvious, but to keep on 'experimenting' as if what needed to be done were not already evident, already tested, already demonstrated to work, and already being done, worldwide (including by RCUK, ERC, NIH, and over a dozen universities):
The EU should mandate that all EU-funded research articles (postprints) are deposited in the fundee's IR immediately upon acceptance for publication. Access can be set in compliance with embargoes, if desired. And data-archiving should be strongly encouraged.
DRIVER's concern should be with ensuring that the network of IRs has the requisite interoperability to make it maximally useful and useable for further research progress.
THE FEEDER AND THE DRIVER:
Deposit Institutionally, Harvest Centrally
Stevan Harnad
DRIVER is designing an infrastructure for European and Worldwide Open Access research output, stored in
institutional and disciplinary repositories, now increasingly under
institutional and research-funder mandates. It is critical for DRIVER to explicitly take into account in its design (as some research funders have not yet done, because they have not yet thought it through) that institutional and disciplinary (central) repositories (IRs and CRs), although they are fully interoperable and at a par in that respect, nevertheless play profoundly different roles.
Universities and research institutions are the FEEDERS-- the primary providers of research, funded and unfunded, in all disciplines -- for both kinds of repositories (IRs and CRs).
This difference in role and function must be concretely reflected in the design of the DRIVER infrastructure. The primary locus of deposit for all research output is the researcher's own institution's IR (except in the increasingly rare case of institutionally unaffiliated researchers). Thanks to OAI-interoperability, the metadata for those deposits, or even the full-text deposits themselves, can also be harvested by (or exported to) any number of CRs -- discipline-based CRs, funder-based CRs, theme-based CRs, national CRs, European CRs, global CRs.
Neither IRs nor CRs will fill without deposit mandates. This is a hard lesson, that has been learned very late (NIH, for example, made the mistake of requesting rather than requiring deposit, the NIH policy failed, and three years of research impact was consequently lost); but the lesson has now at long last indeed been learned. So the number of institutional and funder mandates is now set to grow dramatically. Institutions of course always mandate deposit in their own IRs. Many funders have mandated deposit, indicating that deposit can be in either IRs or CRs. But a few funders still stipulate, dysfunctionally, that deposit must be in CRs.
This is a symptom of not having thought OA through. Funders are of course greatly to be commended for mandating OA, but their short-sightedness on the question of locus and means of deposit needs correction, and DRIVER can and should help with this, pre-emptively, rather than blindly following the unreflective and incoherent trends in the air today. Indeed DRIVER must take a coherent position, if it wants OA content to be provided and OA repositories to be filled, reliably and fully.
The model that DRIVER should adopt in designing its infrastructure is "Deposit Institutionally, Harvest Centrally." That is the way to scale up -- simply, swiftly, systematically and surely -- to 100% OA. I give the reasons in detail in
my talk tomorrow, but for now, I just want to point out the principle points:
Institutions (i.e., universities and research institutes) are the providers -- the source -- of all research. Institutions have a direct interest in showcasing and managing their own research output, but they have been even more sluggish than funders in adopting mandates. If funders mandate central deposit, they neither cover all of OA output nor do they collaborate coherently with the providers (the institutions) to scale up systematically to providing OA to all of their institutional research output. The OAI protocol makes it possible to harvest content from all OAI-compliant repositories. That is the coherent, systematic pattern of content provision for which DRIVER should be designed, not an incoherent patchwork of arbitrary institutional and central depositing and repositories that will neither scale up to all of OA nor accelerate its attainment.
Not all research is funded; not all research fits into defined disciplines; disciplines are not all independent. Disciplines, being overlapping and redundant, would entail that discipline-based depositing had to be be overlapping and redundant. Depositing can be mandated once, but not multiply. The natural way to ensure that a paper is present in multiply loci (institutional, (multi)-disciplinary, national, etc.) is to deposit it at source – i.e., institutionally – and then
harvest or
import its metadata (or both its metadata and the paper itself) into whatever CRs we decide we need. That is what the OAI interoperability protocol itself was designed for.
And, not to put too fine a point on it, the very notion of Central Repositories already betrays something of a misunderstanding of the online medium: Is Google a central repository? Is it a repository at all? Do people deposit directly in Google?
OAIster,
Citebase (and many other central OAI services like them) are an even better model: OAIster and Citebase were explictly designed to be
OAI service-providers -- functional overlays on the distributed OA content-providers. Do CRs -- disciplinary, interdisciplinary, national and international -- really need to be any more than that?
Stevan Harnad
American Scientist Open Access Forum