Institutional Repositories

Wednesday, April 16. 2008

Data exchange among disparate repositories

[re-posted from Peter Suber's Open Access News]

ECS developers win $5000 repository challenge, a press release from the University of Southampton School of Electronics and Computer Science (ECS), April 15, 2008.

Excerpt:
Developers from ECS, Southampton, and Oxford University won a $5000 challenge competition which took place at the OR08 Open Repositories international conference.

Dave Tarrant, Tim Brody (Southampton) and Ben O'Steen (Oxford), beat a large field of contenders, including finalists from the USA and Australia, by demonstrating that digital data can be moved easily between storage sites running different software while remaining accessible to users (watch video). This approach has important implications for data management and preservation on the Web....

[W]ith the growth of institutional repositories alongside subject-based repositories, and in cases where multiple-authors of a paper belong to different institutions, it is important to be able to share and copy content between repositories.

Meanwhile the repository space has become characterised by many types of repository software - DSpace, EPrints and Fedora are the most widely used open source repository software - containing many different types of content, including texts, multimedia and interactive teaching materials. So although sharing content and making it widely available (interoperability) has always been a driver for repository development, actually moving content on a large scale between repositories and providing access from all sources is not easy.

The OR08 challenge, set by the Common Repository Interfaces Group (CRIG), had just one rule for the competition: the prototype created had to utilise two different 'repository' platforms....

This data transfer was achieved using an emerging framework known as Object Reuse and Exchange (ORE), a topic that attracted one of the highest attendances at OR08....

Comment [from Peter Suber]. Congratulations to Tarrant, Brody, and O'Steen. I look forward to the day when institutional repositories can harvest full-texts and metadata from disciplinary repositories and vice versa. That will greatly reduce the temperature on the question where researchers initially deposit their work (and where universities and funders require them to deposit their work), and greatly increase the security of deposits (on the LOCKSS principle). Thanks to ORE and the tools developed by the Southampton-Oxford team, this day is not far off.

Comment from Stevan Harnad: The demonstration of the bulk transferability of the contents of one OAI-compliant repository to another is indeed welcome. It shows that it does not really matter from the point of view of either accessibility or harvestability where a research output is deposited (as long as it's in an OAI-compliant repository). But where it is deposited still matters a great deal for the probability of research output being deposited at all, and especially for the probability of deposit mandates being adopted at all -- particularly deposit mandates on the part of institutions, who are the providers of all the research output, funded and unfunded, across all disciplines.

The importance of the new OR08 demonstration of the transferability of Institutional Repository (IR) contents is hence greatest for confirming that both institutional and funder mandates can and should require deposit in the author's institutional IR, from which central harvesters, indexers and search engines, as well as Central Repositories (CRs) like PubMed Central, can then harvest/import them. This convergent synergy would be best for the progress of OA.

(The fact that external deposits can also be back-harvested to the depositor's own institutional IR is also welcome and useful, but it certainly does not imply that depositing willy-nilly anywhere is as likely to scale up to systematic OA policies, generating universal OA, as depositing, systematically and convergently at the universal source: the researcher's own IR -- and then, where desired, harvesting/exporting externally therefrom.)

Swan, A., Needham, P., Probets, S., Muir, A., Oppenheim, C., O’Brien, A., Hardy, R. and Rowland, F. (2005) Delivery, Management and Access Model for E-prints and Open Access Journals within Further and Higher Education. JISC Technical report.

Swan, A., Needham, P., Probets, S., Muir, A., Oppenheim, C., O’Brien, A., Hardy, R., Rowland, F. and Brown, S. (2005) Developing a model for e-prints and open access journal content in UK further and higher education. Learned Publishing, 18 (1). pp. 25-40.

Posted by Stevan Harnad in Institutional Repositories at 14:36

Tuesday, March 18. 2008

Publisher Proxy Deposit Is A Potential Trojan Horse: I

Ann Okerson: "If your publishing organization is providing for your authors the service of deposit of their articles according to various mandates, particularly NIH (beginning on 4/7) could you kindly describe the nature or extent of these services"
Paul Gherman: "At Vanderbilt, our Medical Library has been doing significant work contacting publishers to find out what their policy and procedures are. One discovery is that some of them intend to charge authors between $900 and $3,000 to submit articles to NIH. Some will allow for early posting, if the fee is paid."

I suggest not colluding with publishers offering to "Let us do the [mandated] deposit for you".

The reason is simple, if we take the moment to think it through:

(1) The OA movement's goal is to provide Open Access (OA) to 100% of the world's peer-reviewed research article output.

(2) The goal of [some] publishers is to preserve the status quo -- or the closest approximation to it -- for as long as possible, at all costs.

(3) The providers of both the research and the peer review, in all disciplines, are the employees of the universities (and research institutions) worldwide (and the fundees of the funded research).

(4) The only way to cover all of OA's target space is for all research output, from all universities worldwide, funded and unfunded, across all disciplines, to be made OA.

(5) OA self-archiving mandates from universities and funders (39 so far, worldwide) can ensure that all research is made OA.

(6) Some funders (e.g. NIH) -- but no universities (e.g. Harvard) -- have mandated direct university-external self-archiving (in PubMed Central) instead of direct university-internal self-archiving (and subsequent central harvesting). This was an unnecessary strategic mistake.

(7) University-external, subject-based self-archiving does not scale up to cover all of OA output space: it is divergent, divisive, arbitrary, incoherent and unnecessary.

(8) The way to scale up systematically to capture all of OA output is for both funders and universities to mandate that all research output, in all disciplines, from all universities worldwide, funded and unfunded, should be self-archived in each university's own Institutional Repository (IR). (The deposits, or their metadata, can then be externally harvested into whatever subject-based, disciplinary, or multidisciplinary central collections we may desire.)

(9) The universities are the research providers; the universities are the co-beneficiaries of showcasing, archiving, auditing, assessing and maximizing the visibility, usage and impact of (all) their own research output, funded and unfunded, across all disciplines.

(10) The universities are also in the natural and optimal position to monitor and reward their own employees' compliance with both university and funder self-archiving mandates.

(11) It would hence systematically undermine the scaling and convergence of OA self-archiving mandates onto university IRs to transfer responsibility for compliance to an external party -- the publisher as their employees' proxy self-archiver -- depositing in arbitrary and divergent external repositories.

(12) Universities and funders should universally mandate self-archiving directly in each author's own university's IR; they should say "no, thank you" to offers of proxy self-archiving on behalf of their employees from publishers. External collections can then be harvested, as desired, from the IRs that will then cover 100% of OA output.

[Similar considerations, but on a much lesser scale, militate against the strategy of universities out-sourcing the creation and management of their IRs and self-archiving policies to external contractors: accounting, archiving, record-keeping and asset management should surely be kept under direct local control by universities. There's nothing so complicated or daunting about self-archiving and IRs as to require resorting to an external service. (More tentatively, I am also sceptical that library proxy self-archiving rather than direct author self-archiving is a wise choice in the long run -- though it is definitely a useful option as a start-up supplement, if coupled with a mandate, and has been successfully implemented in several cases, including QUT and CERN.)]

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Institutional Repositories at 14:18

Wednesday, March 12. 2008

The Special Case of Law Reviews

Law Library Director and Assistant Professor of Law at the University of New Mexico School of Law, Carol Parker, has published an article in the New Mexico Law Review, Vol. 37, No. 2, Summer 2007 (just blogged by Peter Suber in Open Access News), entitled "Institutional Repositories and the Principle of Open Access: Changing the Way We Think About Legal Scholarship."

Though a bit out of date now in some of its statistics, because things are moving so fast, this article gives a very good overview of OA and concludes that, no, Law Reviews are not a special case: Those articles, too, and their authors and institutions, would benefit from being self-archived in each author's Institutional Repository to make them OA.

Professor Parker conjectures that most potential users worldwide already have affordable subscription access to all the law journal articles they need via Westlaw and Lexis, so the advantage of OA in Law might be just one of speed and convenience, not a remedy for access-denial.

(This might be the case, but I wonder if anyone actually has quantitative evidence, canvassing users across institutions worldwide for research accessibility, and comparing Law with other disciplines?)

In any case, if you haven't already seen it, Professor Parker's article is highly recommended and in light of the recent NIH, ERC, and Harvard self-archiving mandates and ongoing deliberations about further mandates worldwide, the article is especially timely.

Prior Topic Thread on American Scientist Open Access Forum:
"The Special Case of Law Reviews" (thread began 2003)

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Institutional Repositories at 01:46

Wednesday, February 13. 2008

Harvard Adopts 38th Green Open Access Self-Archiving Mandate

Absent any new information (or amendments) to the contrary, Harvard University's Faculty of Arts and Sciences on Tuesday February 12 adopted the world's 38th Green Open Access Self-Archiving Mandate -- the 16th of the institutional or departmental mandates.

An OA mandate from Harvard is especially significant, timely and welcome for the worldwide Open Access movement, as Harvard will of course be widely emulated, and many other universities are now proposing to adopt OA mandates.

The objective of the Harvard (Faculty of Arts and Sciences) mandate is to provide Open Access (OA) to its own scholarly article output. This objective is accomplished by making those articles freely accessible on the web by depositing them in a Harvard OA Institutional Repository.

The means of attaining this objective is to mandate OA, which Harvard has now done. But Harvard has gone further, and mandated copyright retention as well. Copyright retention is highly desirable and welcome, but it is not necessary in order to provide OA, and mandating copyright retention has also necessitated the adoption of an opt-out clause because of potential author resistance to perceived or actual constraints on their choice of which journal to publish in.

What follows below is a recommendation for a few small but crucial changes in the wording of the mandate. They are designed to prevent the copyright-retention requirement from compromising the deposit requirement (thereby causing the Harvard OA Mandate to fail, as the original NIH policy failed, until its flaws were corrected three years later).

First, here is the draft Harvard OA mandate as it now stands. [passages that are flagged for modification are in brackets]:

Text of Motion on behalf of the Provost’s Committee on Scholarly Publishing:

The Faculty of Arts and Sciences of Harvard University is committed to disseminating the fruits of its research and scholarship as widely as possible. In keeping with that commitment, the Faculty adopts the following policy:

[COPYRIGHT RETENTION POLICY] Each Faculty member [grants] to the President and Fellows of Harvard College permission to make available his or her scholarly articles and to exercise the copyright in those articles. In legal terms, the permission granted by each Faculty member is a nonexclusive, irrevocable, paid-up, worldwide license to exercise any and all rights under copyright relating to each of his or her scholarly articles, in any medium, and to authorize others to do the same, provided that the articles are not sold for a profit.

[OPT-OUT CLAUSE] The [policy] will apply to all scholarly articles written while the person is a member of the Faculty except for any articles completed before the adoption of this policy and any articles for which the Faculty member entered into an incompatible licensing or assignment agreement before the adoption of this policy. The Dean or the Dean’s designate will waive application of the policy for a particular article upon written request by a Faculty member explaining the need.

[DEPOSIT MANDATE] To assist the University in [distributing the articles], each Faculty member [will provide] an electronic copy of the [final version] of the article at no charge to the appropriate representative of the Provost’s Office in an appropriate format (such as PDF) specified by the Provost’s Office. [The Provost’s Office may make the article available to the public in an open-access repository.]

The Office of the Dean will be responsible for interpreting this policy, resolving disputes concerning its interpretation and application, and recommending changes to the Faculty from time to time. The policy will be reviewed after three years and a report presented to the Faculty.

Now here are the small but crucial changes that will immunize the deposit requirement against any opt-outs from the copyright-retention requirement. Note the re-ordering of the clauses, and the addition of the underscored passages. (Other universities may also omit the two indented clauses preceded by asterisk ** if they wish):

Proposed revision:

The Faculty of Arts and Sciences of Harvard University is committed to disseminating the fruits of its research and scholarship as widely as possible. In keeping with that commitment, the Faculty adopts the following policy:

[DEPOSIT MANDATE] To assist the University in providing Open Access to all scholarly articles published by its Faculty members, each Faculty member is required to provide, immediately upon acceptance for publication, an electronic copy of the final peer-reviewed draft of each article at no charge to the appropriate representative of the Provost’s Office in an appropriate format (such as PDF) specified by the Provost’s Office. This can be done either by depositing it directly in Harvard's Institutional Repository or by emailing it to the Provost’s Office to be deposited on the author's behalf.
**[COPYRIGHT RETENTION POLICY] Each Faculty member is also encouraged to grant to the President and Fellows of Harvard College permission to make available his or her scholarly articles and to exercise the copyright in those articles. In legal terms, the permission granted by each Faculty member is a nonexclusive, irrevocable, paid-up, worldwide license to exercise any and all rights under copyright relating to each of his or her scholarly articles, in any medium, and to authorize others to do the same, provided that the articles are not sold for a profit.

**[COPYRIGHT-RETENTION POLICY OPT-OUT CLAUSE] The copyright retention and licence-granting policy will apply to all scholarly articles written while the person is a member of the Faculty except for any articles completed before the adoption of this policy and any articles for which the Faculty member entered into an incompatible licensing or assignment agreement before the adoption of this policy. The Dean or the Dean’s designate will waive application of the policy for a particular article upon written request by a Faculty member explaining the need.
The Office of the Dean will be responsible for interpreting this policy, resolving disputes concerning its interpretation and application, and recommending changes to the Faculty from time to time. The policy will be reviewed after three years and a report presented to the Faculty.

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Institutional Repositories at 12:05

Friday, January 25. 2008

1st DRIVER Summit Report

Highlights from the 1st DRIVER Summit Report:

"On 16 and 17 January 2008, DRIVER II successfully carried out its first Summit in Goettingen, Germany. Approximately 100 invited representatives from the European Community, including representatives of the European Commission, over 20 spokespersons of European repository initiatives as well as experts in different repository related fields from Europe, the U.S., Canada and South Africa came together to discuss their experiences and concrete actions with respect to the further building of cross-country repository infrastructures...

"...Director of the National Science Library, Academy of Sciences in Beijing, China, Professor Xiaolin Zhang [wrote]: 'As China is beginning to initiate institutional repositories in its universities and research institutes, the knowledge accumulated and the infrastructure developed by DRIVER and its members would be instrumental to Chinese institutes and future Chinese IR networks. Therefore, on behalf of the National Science Library of Chinese Academy of Sciences, I would like to express the interest to collaborate with DRIVER in ways fit, and to become a member in a future confederation of digital repositories.' In advance of the Summit, representatives from India and South Africa had also expressed their strong interest in becoming a part of the upcoming global repository network...

"...Deirdre Furlong [Policy Officer, EC] expressed the interest of the EC to prepare an Open Access Mandate for all EC-funded research publications and to learn how existing repository infrastructures, like DRIVER, can support this mandate...."

From Norbert Lossau's Summary:

"The conditions to populate repositories with content and to implement a coherent European and global digital repository based eInfrastructure are more favourable than ever before. The Council's Conclusions on Scientific Information, the European Research Council Open Access mandate and the current preparation of an Open Access mandate for all EC funded research publications can draw from the existing infrastructure efforts which must be accelerated in the coming months...

"...Although funded by the EC, DRIVER II must be open to international efforts and collaborate on a global scale. This point was also clearly underlined by the EC representatives as part of the Commission's eInfrastructure strategy...

"...The EC-funded DRIVER II project is leading the way as the largest initiative of its kind in helping to enhance repository development worldwide. Its main objective is to build a virtual, European scale network of existing institutional repositories using technology that will manage the physically distributed repositories as one large scale virtual content source."

Posted by Stevan Harnad in Institutional Repositories at 18:46

Thursday, January 17. 2008

Don't Conflate 3rd Party Fair Use and Course Packs with 1st Party Open Access Provision

Please don't mix up the problem of University Course Packs with Open Access Provision.

Universities have long-standing frustrations about what 3rd-party buy-in content they can and cannot include in Course Packs for student use.

That's an old story, and universities should continue to strive to get the best deal they can with AAP publishers for that, as Hofstra, Marquette, and Syracuse Universities have been trying to do.

But on no account should this unending saga be conflated with Open Access (OA) provision (by those same three universities). Universities provide OA to their own research output, by self-archiving it in their own OA Institutional Repositories (IRs). That is 1st-party OA content.

The connection between the two is this: In its efforts to strike a deal with AAP Publishers for their ("fair") use of 3rd-party content (originating, say, from, Hofstra University), Marquette and Syracuse need not worry about the content that they include in their course packs that consists of links to Hofstra's OA articles, deposited in Hofstra's IR. Nor vice versa. All they need do is link to it. And no need to download, print or photocopy it for with course packs. The link is enough, and the students can do the rest (if they lack affection for trees).

And as Green OA self-archiving of 1st party content in authors' OA IRs grows, the frustrations of jostling for 3rd party content will simply shrink and disappear.

So, in keeping with the Golden Rule, universities should be mandating Green OA self-archiving, alongside whatever deals they may be cutting for fair use in course packs.

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Institutional Repositories at 19:14

Wednesday, January 16. 2008

Critique of EU Council's Conclusions (again heavily influenced by the publisher anti-OA lobby)

Here is the video of my presentation to the DRIVER Summit:

Institutional Versus Central Deposit:
Optimising DRIVER Policy for the OA Mandate and Metric Era

Also to be discussed at the DRIVER Summit is this statement by the EU Council (not to be confused with the European Research Council (ERC), which has mandated OA self-archiving!) The EU Council's Conclusions show the tell-tale signs of penetration by the publisher anti-OA lobby; familiar slogans, decisively rebutted many, many times before, crop up verbatim in the EU Council's language, though the Council does not appear to realize that it has allowed itself to become the mouthpiece of these special interests, which are not those of the research community:

Council of the European Union: Conclusions on scientific information in the digital age: access, dissemination and preservation

Here is my critique of this EU Council statement (all boldface quotes are from the Council's statement, the underscores have been added):

"the importance of scientific output resulting from publicly funded research being available on the Internet at no cost to the reader under economically viable circumstances, including delayed open access"

(1) 'At no cost to the reader' conflates site-licensing and Open Access (OA). This wording was no doubt urged by the publisher lobby. The focus should be on providing free online access webwide. That is OA, and that makes the objective clear and coherent.

(2) 'Delayed open access' refers to publisher embargoes on author self-archiving. If embargoes are to be accommodated, it should be made clear that they apply to the date at which the access to the embargoed document is made OA, not to the date at which the document is deposited, which should be immediately upon acceptance for publication. The DRIVER network of Institutional Repositories (IRs) can then adopt the 'email eprint request' button that will allow individual users to request and receive individual copies of the document semi-automatically.

(3) What should be deposited in the author's own institutional IR immediately upon acceptance for publication is the author's peer-reviewed, accepted final draft ('postprint'), not the publisher's PDF (or XML). There are far more publisher embargoes on the PDF/XML than on the postprint, and the postprint is all that is needed for research use and progress. The postprint is a supplementary version of the official publication, provided for OA purposes; it is not the version with the primary digital preservation problem.

(4) Digital preservation should not be conflated with OA provision: There is a (separate) problem of the digital preservation of the publisher's PDF/XML, but this is not the same as the problem of providing OA to the author's postprint. The postprint, though it can and should be preserved, is not the canonical copy of the publication, so the two preservation tasks should not be conflated.

(5) Self-archiving research data is also a different matter from self-archiving research publications. Data-archiving is not subject to a publisher embargo, and it needs independent preservation, but data-access and data-preservation should not be conflated with OA provision.

(6) Deposit should be directly in each author's own IR: Distributed institutional depositing and storage should not be conflated with central harvesting and indexing: Deposit Institutionally, Harvest Centrally.

(7) Direct central deposit should be avoided except in cases where the author is institutionally unaffiliated or the author's institution does not yet have an IR. For those cases, there should be at least one provisional default repository such as DEPOT.

(8) Research (publications and data) should not be conflated with other forms of digital content. The problems of cultural heritage archiving, for example, are not the same as those of research publication archiving. Nor are the problems of archiving the same as the problem of access-provision (OA).

"ensure the long term preservation of scientific information -including publications and data"

This is an example of the complete conflation of OA-provision with digital preservation, including a conflation of authors' supplementary postprints with the publisher's original, as well as a conflation of research publications with research data.

DRIVER will not have a coherent programme unless it clearly and systematically de-conflates OA-provision from digital preservation, primary publications from authors' supplementary postprints, and publication-archiving from data-archiving, treating each of these separately, on its own respective terms.

"experiments on and wide deployment of scientific data infrastructures with cross-border, cross-institution and cross-discipline added-value for open access to and preservation of scientific information"

This again conflates OA provision with digital preservation and conflates publications with data. It also conflates both of these with IR interoperability, which is yet another matter. (And webwide OA is, by definition, cross-institution, cross-border and cross-discipline, so that is a non-issue.)

What is an issue, however, is institutional versus central depositing, and it is crucial that DRIVER have a clear, coherent policy (insofar as research archiving is concerned -- this does not necessarily apply to other forms of digital content): Deposit Institutionally: Harvest/Index/Search Centrally.

The emphasis of DRIVER should accordingly be on ensuring that the distributed IRs have the requisite interoperability for whatever central harvesting, indexing, search and analysis are needed and desired.

"promoting, through these policies, access through the internet to the results of publicly financed research, at no cost to the reader, taking into consideration economically sustainable ways of doing this, including delayed open access"

Economic sustainability is again a red herring introduced by the publishing lobby into language that should only concern the research community and research access. The economic sustainability of publishing is not DRIVER's concern.

DRIVER's concern should be interoperable OA-provision (plus whatever cultural-heritage and other forms of archiving DRIVER wishes to provide the infrastructure for).

Nor are publisher access-embargoes DRIVER's concern: DRIVER should merely help ensure immediate deposit in IRs, and it should facilitate research usage needs through IR interoperability as well as the IRs' email eprint request button.

"2008 working towards the interoperability of national repositories of scientific information in order to facilitate accessibility and searchability of scientific information beyond national borders"

Insofar as research is concerned, it is not the interoperability of national repositories that is crucial but the interoperability of all OA IRs.

"2009 contributing to an effective overview of progress at European level, informing the Commission of results and experiences with alternative models for the dissemination of scientific information."

This is again a red herring (for both the EU and for DRIVER) introduced by the publishing lobby: Research archiving and OA-provision are neither a matter of alternative publishing models nor a matter of alternatives to the generic peer-reviewed publication model. Publishing reform and peer review reform are not DRIVER matters. They can and will evolve too, but DRIVER should focus on the deposit of current published research as well as research data in IRs, and the interoperability of those IRs. That is the immediate problem. The rest is merely speculative for now.

"B. Invitation to the Commission to implement the measures announced in the Communication on "scientific information in the digital age: access, dissemination and preservation", and in particular to: 1. Experiment with open access to scientific publications resulting from projects funded by the EU Research Framework Programmes by: defining and implementing concrete experiments with open access to scientific publications resulting from Community funded research, including with open access."

This is a vague way of saying that the publishing lobby has persuaded the EU not to do the obvious, but to keep on 'experimenting' as if what needed to be done were not already evident, already tested, already demonstrated to work, and already being done, worldwide (including by RCUK, ERC, NIH, and over a dozen universities):

The EU should mandate that all EU-funded research articles (postprints) are deposited in the fundee's IR immediately upon acceptance for publication. Access can be set in compliance with embargoes, if desired. And data-archiving should be strongly encouraged. DRIVER's concern should be with ensuring that the network of IRs has the requisite interoperability to make it maximally useful and useable for further research progress.

      THE FEEDER AND THE DRIVER:
       Deposit Institutionally, Harvest Centrally

      Stevan Harnad

DRIVER is designing an infrastructure for European and Worldwide Open Access research output, stored in institutional and disciplinary repositories, now increasingly under institutional and research-funder mandates. It is critical for DRIVER to explicitly take into account in its design (as some research funders have not yet done, because they have not yet thought it through) that institutional and disciplinary (central) repositories (IRs and CRs), although they are fully interoperable and at a par in that respect, nevertheless play profoundly different roles.

Universities and research institutions are the FEEDERS-- the primary providers of research, funded and unfunded, in all disciplines -- for both kinds of repositories (IRs and CRs).

This difference in role and function must be concretely reflected in the design of the DRIVER infrastructure. The primary locus of deposit for all research output is the researcher's own institution's IR (except in the increasingly rare case of institutionally unaffiliated researchers). Thanks to OAI-interoperability, the metadata for those deposits, or even the full-text deposits themselves, can also be harvested by (or exported to) any number of CRs -- discipline-based CRs, funder-based CRs, theme-based CRs, national CRs, European CRs, global CRs.

Neither IRs nor CRs will fill without deposit mandates. This is a hard lesson, that has been learned very late (NIH, for example, made the mistake of requesting rather than requiring deposit, the NIH policy failed, and three years of research impact was consequently lost); but the lesson has now at long last indeed been learned. So the number of institutional and funder mandates is now set to grow dramatically. Institutions of course always mandate deposit in their own IRs. Many funders have mandated deposit, indicating that deposit can be in either IRs or CRs. But a few funders still stipulate, dysfunctionally, that deposit must be in CRs.

This is a symptom of not having thought OA through. Funders are of course greatly to be commended for mandating OA, but their short-sightedness on the question of locus and means of deposit needs correction, and DRIVER can and should help with this, pre-emptively, rather than blindly following the unreflective and incoherent trends in the air today. Indeed DRIVER must take a coherent position, if it wants OA content to be provided and OA repositories to be filled, reliably and fully.

The model that DRIVER should adopt in designing its infrastructure is "Deposit Institutionally, Harvest Centrally." That is the way to scale up -- simply, swiftly, systematically and surely -- to 100% OA. I give the reasons in detail in my talk tomorrow, but for now, I just want to point out the principle points:

Institutions (i.e., universities and research institutes) are the providers -- the source -- of all research. Institutions have a direct interest in showcasing and managing their own research output, but they have been even more sluggish than funders in adopting mandates. If funders mandate central deposit, they neither cover all of OA output nor do they collaborate coherently with the providers (the institutions) to scale up systematically to providing OA to all of their institutional research output. The OAI protocol makes it possible to harvest content from all OAI-compliant repositories. That is the coherent, systematic pattern of content provision for which DRIVER should be designed, not an incoherent patchwork of arbitrary institutional and central depositing and repositories that will neither scale up to all of OA nor accelerate its attainment.

Not all research is funded; not all research fits into defined disciplines; disciplines are not all independent. Disciplines, being overlapping and redundant, would entail that discipline-based depositing had to be be overlapping and redundant. Depositing can be mandated once, but not multiply. The natural way to ensure that a paper is present in multiply loci (institutional, (multi)-disciplinary, national, etc.) is to deposit it at source – i.e., institutionally – and then harvest or import its metadata (or both its metadata and the paper itself) into whatever CRs we decide we need. That is what the OAI interoperability protocol itself was designed for.

And, not to put too fine a point on it, the very notion of Central Repositories already betrays something of a misunderstanding of the online medium: Is Google a central repository? Is it a repository at all? Do people deposit directly in Google?

OAIster, Citebase (and many other central OAI services like them) are an even better model: OAIster and Citebase were explictly designed to be OAI service-providers -- functional overlays on the distributed OA content-providers. Do CRs -- disciplinary, interdisciplinary, national and international -- really need to be any more than that?

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Institutional Repositories at 01:24

First DRIVER Summit: Towards a Confederation of Digital Repositories

   First DRIVER Summit:
   Towards a Confederation of Digital Repositories
   16-17 January 2008 Goettingen, Germany
DRIVER responds to the vision that all relevant scientific content should be easily accessible through internet-based infrastructures. Achieving this vision reaches beyond technology - it is also the organisational dimension that allows a stable and trusted network of content providers. DRIVER is working with repository federations in Europe, and reaches out to further international communities (for example in the US, China, India, and Africa) in order to determine the practical requirements for a confederation of digital repositories.

   Wednesday, January 16, 2008
   13:00-13:15   Opening Address (Norbert Lossau, DRIVER)
   13:15-14:00   Opening Keynote - EC Research Infrastructure & Digital Repository Roadmap
(Mario Campolargo, EC)
   14:00-14:30   Organising Infrastructures: Experiences from D-Space, Open Content Alliance and other community efforts
(Michele Kimpton, D-Space Foundation)
Chair: Sylvia van Peteghem, DRIVER
   15:00-15:30   DRIV(ER)ing Research Infrastructures
(Yannis Ioannidis, DRIVER)
   15:30-16:00   Growing Repositories
(Bill Hubbard, DRIVER)
   16:00-16:30   Repository Landscape - DRIVER Studies
(Kasja Weenink, DRIVER)
Chair: Sijbolt Noorda, EUA
   17:00-18:30   Panel Discussion "Open Access and Repository Infrastructures": Responding to the "Council Conclusions on Scientific Information"

   Thursday, January 17, 2008
Chair: Rüdiger Klein, ESF (tbc)
   09:00-09:30   Institutional versus Central Deposit: Optimising DRIVER Policy for the OA Mandate and Metric Era
(Stevan Harnad, Université du Québec à Montréal & University of Southampton )
   09:30-10:00   Challenges in DRIVER-II
(Donatella Castelli, DRIVER)
Chair: Carlos Morais-Pires, EC
   10:00-10:30   Disciplinary Repository Projects
EuroVO-AIDA: Astronomical virtual observatories, repositories and data access (Sebastien Deriere, Thomas Boch)
METAFOR -- Climate Modelling (Michael Lautenschläger)
   11:00-12:30   Disciplinary Repository Projects
Genesi-DR -- Earth Science (Luigi Fusco, Donatella Castelli)
D4Science -- Environmental Monitoring (Donatella Castelli)
eCrystals -- Crystallography (Simon Coles, Liz Lyon)
OLAC: The Open Language Archives Community (Gary Simons)
Language Resource Management and Discovery in CLARIN (Peter Wittenburg)
Nereus/Neeo -- Economists (Vanessa Proudman)
Discussion
Chair: Wolfram Horstmann, DRIVER
   13:30-14:45   National & International Repository Networks
EIFL -- Open Access Program (Susan Veldsman, Rima Kupryte)
Open Access Repositories in Finland (Rita Voigt)
OpenAccess.se (Jan Hagerlid)
NORA - The Norwegian Experience (Jan Erik Frantsvåg)
eLABa - Lithuanian Academic e-Library (Vilius Kuciukas)
Local Integration, National Federation: TARA, TCD-RSS, IReL-Open, Expertise Ireland (Niamh Brennan)
OAI Spain (Alicia Lopez-Medina)
   14.45-15.50   Discussion: Building a Joint Digital Repository Infrastructure
(Moderator: Carlos Morais-Pires, EC; Norbert Lossau, DRIVER)
   15.50-16.00   Wrap-up (Norbert Lossau, DRIVER)

Posted by Stevan Harnad in Institutional Repositories at 00:29

Sunday, December 23. 2007

Deposit Institutionally, Harvest Centrally

University of Michigan’s digital repository now available through PubMed:

"Researchers who find articles by University of Michigan (UM) authors in PubMed can now directly -- and for free -- link to the full text using Deep Blue, UM’s digital repository, via PubMed’s LinkOut feature. Deep Blue is an online archive that preserves and provides access to UM intellectual and creative work. It is the first institutional repository to provide such links."

Congratulations to the University of Michigan and PubMed for adding this excellent and timely feature (to both PubMed and Michigan's Institutional Repository [IR], Deep Blue)! But why stop there?

The implications are obvious: Central Repositories [CRs] (like PubMed Central and Arxiv and CogPrints) should not be deposited in directly, because that merely complicates and competes with a systematic worldwide policy of depositing all institutional research output in each institution's own, OAI-compliant IR. Institutions are the primary research providers. They have the greatest stake in ensuring that all their own research output is maximally visible, accessible, and usable, thereby maximizing the institution's research impact. Institutions are also the best placed to showcase, monitor and reward the self-archiving of their own research output.

All institutions should mandate that all their research article output must be deposited in their own IR. Research funders (like NIH) should also mandate that all the research article output from the research they fund must be deposited in the fundee's own institution's IR.

Then CRs like PubMed Central as well as indexers like PubMed (or Thompson ISI or Scopus or Google Scholar) can either link to or harvest from the network of interoperable, OAI-compliant IRs.

In this natural way -- "deposit institutionally, harvest centrally" -- all of research output can be systematically made OA. Instead depositing willy-nilly in IRs or CRs will only create confusion and resistance on the part of researchers, who will understandably only wish to do the keystrokes once.

IR software can also help with automatic exports to other OAI-compliant sites where desired, as well as with version control.

Now that the NIH OA self-archiving mandate is imminent, it is all the more important to reformulate it in a way that will scale systematically to all research output worldwide, in all disciplines, rather than leaving it as one non-scaling special case for NIH-funded biomedical research.

And remember that the Web era means distributed content provision and central harvesting, Google-style. It is not, as in paper days, that all the content needs to go in one central physical space.

Swan, A., Needham, P., Probets, S., Muir, A., Oppenheim, C., O’Brien, A., Hardy, R., Rowland, F. and Brown, S. (2005) Developing a model for e-prints and open access journal content in UK further and higher education. Learned Publishing 18 (1). pp. 25-40.
Abstract: A study carried out for the UK Joint Information Systems Committee examined models for the provision of access to material in institutional and subject-based archives and in open access journals. Their relative merits were considered, addressing not only technical concerns but also how e-print provision (by authors) can be achieved – an essential factor for an effective e-print delivery service (for users). A "harvesting" model is recommended, where the metadata of articles deposited in distributed archives are harvested, stored and enhanced by a national service. This model has major advantages over the alternatives of a national centralized service or a completely decentralized one. Options for the implementation of a service based on the harvesting model are presented.

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Institutional Repositories at 13:21

Tuesday, October 16. 2007

How Green Open Access Supports Text- and Data-Mining

SUMMARY: Data-mining robots like SciBorg can harvest Green OA full-texts, self-archived in their authors' Institutional Repositories (IRs) and “repurpose” them for better functionality. The postprint is the author’s own refereed, revised final draft. Green journal publishers endorse author posting of postprints in their own IR, free for all. The author can certainly revise that draft further, making additional corrections, updates and enhancements, including marking it up in XML and adding comments. Those corrections need not be done by the author's own hands: They could be done by a graduate student, a collaborator, a secretary, or a hired hand. The author could also have SciBorg “repurpose” his postprint -- under one trivial condition, easily fulfilled, which is that the locus of the enhanced postprint, the URL from which users must download it, remains the author’s own IR, not a 3rd-party website. It is not only unnecessary but would be highly inimical to the progress of Green OA mandates to insist instead that the Green publisher’s endorsement to self-archive the postprint in the author’s IR is "not enough" for full-blooded OA — that the author must also successfully negotiate with the publisher the retention of the right to assign to 3rd-party harvesters like SciBorg the right to publish a “derivative work” derived from the author’s postprint.

In "Why Green Open Access does not support text- and data-mining", Peter Murray-Rust wrote:

PM-R: "...the first thing to do is to gather a corpus of documents... any other scientist should be able to have access to it. It therefore has to be freely distributable..."

Agreed. So far this is just bog-standard OA. If the original documents are self-archived as Green OA postprints in their authors' Institutional Repositories (IRs), your SciBorg robot can harvest them and data-mine them, and make the results freely accessible (but linking back to the postprint in the author's IR whenever the full-text needs to be downloaded).

PM-R: "[At SciBorg] we are interested in machines understanding science..."

Fine. Let your SciBorg machines harvest the Green OA full-texts and "repurpose" them as they see fit.

PM-R: "almost all articles are copyrighted and non-distributable. Publisher Copyright is a major barrier... you can’t just go out and compile a wordlist or whatever as you may infringe copyright or invisible publisher contracts (we found that out the hard way)..."

You can't do that if you are harvesting the publisher's proprietary text, but you can certainly do that if you are harvesting the author's Green OA postprints.

PM-R: "PDFs are so awful... we have to repurpose them by converting to HTML, XML and so on..."

Fine.

PM-R: "Now the corpus is annotated. Expert humans go through line by line...It is this annotated corpus which is of most use to the scientific community..."

Fine.

PM-R: "So suppose I find 50 articles in 50 different repositories, all of which claim to be Green Open Access. I now download them, aggregate them and [SciBorg] repurpose[s] them. What is the likelihood that some publisher will complain? I would guess very high..."

Complain about what, and to whom? A Green publisher has endorsed the author's posting of his own Green OA postprint in his own IR, free for all. The postprint is the author's own refereed, revised final draft. Now follow me: Having endorsed the posting of that draft, does anyone imagine that the publisher would have any grounds for objection if the author revised the draft further, making additional corrections and enhancements? Of course not. It's exactly the same thing: the author's Green OA postprint.

So what if the author decides to mark it up as XML and add comments? Any grounds for objections? Again, no. Corrections, updates and enhancements of the author's postprint are in complete conformity with posting his postprint.

Suppose the author did not do those corrections with his own hands, but had a colleague, graduate student, a secretary, or a hired hand do them for him, and then posted the corrected postprint? Still perfectly fine.

Now suppose the author had your SciBorg "repurpose" his postprint: Any difference? None -- except a trivial condition, easily fulfilled, which is that the locus of the enhanced postprint, the URL from which users can download it, should again be the author's IR, not a 3rd-party website (which the publisher could then legitimately regard as a rival publisher -- especially if it was selling access to the "repurposed" text).

So the solution is quite obvious and quite trivial: It is fine for the SciBorg harvester to be the locus of the data-mining and enhancement of each Green OA postprint. It can also be the means by which users search and navigate the corpus. But SciBorg must not be the locus from which the user accesses the full-text: The "repurposed" full-text must be parked in the author's own IR, and retrieved from there whenever a user wants to read and download it (rather than just to search and surf the entire corpus via SciBorg).

Not only does this all sound silly: it really is silly. In the online age, it makes no functional difference at all where a document is actually physically located, especially if the document is OA! But we are still at the confused interface between the paper age and the OA era. So we have to be prepared to go through a few silly rituals, to forestall any needless fits of apoplexy, which would otherwise mean further dysfunctional delay (for OA).

So the ritual is this: It would be highly inimical to the progress of Green OA mandates to insist that the publisher's endorsement to self-archive the postprint in the author's IR is "not enough" -- that the author must also successfully negotiate with the publisher the retention of the right to assign to 3rd-party harvesters like SciBorg the right to publish a "derivative work" derived from the author's postprint. That would definitely be the tail wagging the dog, insofar as OA is concerned, and it would put authors off providing Green OA (and hence put their institutions off mandating it) for a long time to come.

Instead, when SciBorg harvests a document from a Green OA IR, SciBorg must make an arrangement with the author that the resultant "repurposed" draft will be deposited by the author in the author's own IR as an update of the postprint. Then, whenever a user of SciBorg wishes to retrieve the "repurposed" draft, the downloading site must always be the author's IR: no direct retrieval from the SciBorg site.

This ritual is ridiculous, and of course it is functionally unnecessary, but it is pseudo-juridically necessary, during this imbecilic interregnum, to keep all parties (publishers, lawyers, IP specialists, institutions, authors) calm and happy -- or at least mutely resigned -- about the transition to the optimal and inevitable that is currently taking place. Once it's over, and we have 100% Green OA, all this papyrophrenic horseplay can be well-deservedly dropped for the nonsense it is.

Please, Peter, be prepared to adapt SciBorg to the exigencies of this all-important (and all too slow-footed) transitional phase, rather than trying to force-fit the status quo to SciBorg, at the cost of still more delays to OA.

PM-R: "Only a rights statement actually on each document would allow us to create a corpus for NLP without fear of being asked to take it down..."

No. Green OA authors with standard copyright agreements are not in a position to license republication rights to SciBorg or any other 3rd party. Let us be happy that they have provided Green OA at all, and let SciBorg be the one to adapt to it for now, rather than vice versa.

Brody, T., Carr, L., Gingras, Y., Hajjem, C., Harnad, S. and Swan, A. (2007) Incentivizing the Open Access Research Web: Publication-Archiving, Data-Archiving and Scientometrics. CTWatch Quarterly 3(3).

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Institutional Repositories at 16:16 | Comments (0) | Trackbacks (0)

« previous page (Page 7 of 9, totaling 85 entries) » next page

Open Access Archivangelism

Wednesday, April 16. 2008

Data exchange among disparate repositories

Tuesday, March 18. 2008

Publisher Proxy Deposit Is A Potential Trojan Horse: I

Wednesday, March 12. 2008

The Special Case of Law Reviews

Wednesday, February 13. 2008

Harvard Adopts 38th Green Open Access Self-Archiving Mandate

Friday, January 25. 2008

1st DRIVER Summit Report

Thursday, January 17. 2008

Don't Conflate 3rd Party Fair Use and Course Packs with 1st Party Open Access Provision

Wednesday, January 16. 2008

Critique of EU Council's Conclusions (again heavily influenced by the publisher anti-OA lobby)

First DRIVER Summit: Towards a Confederation of Digital Repositories

Sunday, December 23. 2007

Deposit Institutionally, Harvest Centrally

Tuesday, October 16. 2007

How Green Open Access Supports Text- and Data-Mining

EnablingOpenScholarship (EOS)

Federal Research Public Access Act (FRPAA)

Alliance for Taxpayer Access (ATA)

Creative Commons License:

Quicksearch

Syndicate This Blog

Materials You Are Invited To Use To Promote OA Self-Archiving:

Archives

Calendar

Categories

Blog Administration

Statistics

Top Referrers

Syndicate This Blog