Wednesday, April 16. 2008Data exchange among disparate repositories
[re-posted from Peter Suber's Open Access News]
ECS developers win $5000 repository challenge, a press release from the University of Southampton School of Electronics and Computer Science (ECS), April 15, 2008.Comment from Stevan Harnad: The demonstration of the bulk transferability of the contents of one OAI-compliant repository to another is indeed welcome. It shows that it does not really matter from the point of view of either accessibility or harvestability where a research output is deposited (as long as it's in an OAI-compliant repository). But where it is deposited still matters a great deal for the probability of research output being deposited at all, and especially for the probability of deposit mandates being adopted at all -- particularly deposit mandates on the part of institutions, who are the providers of all the research output, funded and unfunded, across all disciplines. The importance of the new OR08 demonstration of the transferability of Institutional Repository (IR) contents is hence greatest for confirming that both institutional and funder mandates can and should require deposit in the author's institutional IR, from which central harvesters, indexers and search engines, as well as Central Repositories (CRs) like PubMed Central, can then harvest/import them. This convergent synergy would be best for the progress of OA. (The fact that external deposits can also be back-harvested to the depositor's own institutional IR is also welcome and useful, but it certainly does not imply that depositing willy-nilly anywhere is as likely to scale up to systematic OA policies, generating universal OA, as depositing, systematically and convergently at the universal source: the researcher's own IR -- and then, where desired, harvesting/exporting externally therefrom.) Swan, A., Needham, P., Probets, S., Muir, A., Oppenheim, C., O’Brien, A., Hardy, R. and Rowland, F. (2005) Delivery, Management and Access Model for E-prints and Open Access Journals within Further and Higher Education. JISC Technical report. Tuesday, March 18. 2008Publisher Proxy Deposit Is A Potential Trojan Horse: II suggest not colluding with publishers offering to "Let us do the [mandated] deposit for you".Ann Okerson: "If your publishing organization is providing for your authors the service of deposit of their articles according to various mandates, particularly NIH (beginning on 4/7) could you kindly describe the nature or extent of these services"Paul Gherman: "At Vanderbilt, our Medical Library has been doing significant work contacting publishers to find out what their policy and procedures are. One discovery is that some of them intend to charge authors between $900 and $3,000 to submit articles to NIH. Some will allow for early posting, if the fee is paid." The reason is simple, if we take the moment to think it through: (1) The OA movement's goal is to provide Open Access (OA) to 100% of the world's peer-reviewed research article output.[Similar considerations, but on a much lesser scale, militate against the strategy of universities out-sourcing the creation and management of their IRs and self-archiving policies to external contractors: accounting, archiving, record-keeping and asset management should surely be kept under direct local control by universities. There's nothing so complicated or daunting about self-archiving and IRs as to require resorting to an external service. (More tentatively, I am also sceptical that library proxy self-archiving rather than direct author self-archiving is a wise choice in the long run -- though it is definitely a useful option as a start-up supplement, if coupled with a mandate, and has been successfully implemented in several cases, including QUT and CERN.)] Stevan Harnad American Scientist Open Access Forum Wednesday, March 12. 2008The Special Case of Law Reviews
Law Library Director and Assistant Professor of Law at the University of New Mexico School of Law, Carol Parker, has published an article in the New Mexico Law Review, Vol. 37, No. 2, Summer 2007 (just blogged by Peter Suber in Open Access News), entitled "Institutional Repositories and the Principle of Open Access: Changing the Way We Think About Legal Scholarship."
Though a bit out of date now in some of its statistics, because things are moving so fast, this article gives a very good overview of OA and concludes that, no, Law Reviews are not a special case: Those articles, too, and their authors and institutions, would benefit from being self-archived in each author's Institutional Repository to make them OA. Professor Parker conjectures that most potential users worldwide already have affordable subscription access to all the law journal articles they need via Westlaw and Lexis, so the advantage of OA in Law might be just one of speed and convenience, not a remedy for access-denial. (This might be the case, but I wonder if anyone actually has quantitative evidence, canvassing users across institutions worldwide for research accessibility, and comparing Law with other disciplines?) In any case, if you haven't already seen it, Professor Parker's article is highly recommended and in light of the recent NIH, ERC, and Harvard self-archiving mandates and ongoing deliberations about further mandates worldwide, the article is especially timely. Prior Topic Thread on American Scientist Open Access Forum:Stevan Harnad American Scientist Open Access Forum Wednesday, February 13. 2008Harvard Adopts 38th Green Open Access Self-Archiving Mandate
Absent any new information (or amendments) to the contrary, Harvard University's Faculty of Arts and Sciences on Tuesday February 12 adopted the world's 38th Green Open Access Self-Archiving Mandate -- the 16th of the institutional or departmental mandates.
An OA mandate from Harvard is especially significant, timely and welcome for the worldwide Open Access movement, as Harvard will of course be widely emulated, and many other universities are now proposing to adopt OA mandates. The objective of the Harvard (Faculty of Arts and Sciences) mandate is to provide Open Access (OA) to its own scholarly article output. This objective is accomplished by making those articles freely accessible on the web by depositing them in a Harvard OA Institutional Repository. The means of attaining this objective is to mandate OA, which Harvard has now done. But Harvard has gone further, and mandated copyright retention as well. Copyright retention is highly desirable and welcome, but it is not necessary in order to provide OA, and mandating copyright retention has also necessitated the adoption of an opt-out clause because of potential author resistance to perceived or actual constraints on their choice of which journal to publish in. What follows below is a recommendation for a few small but crucial changes in the wording of the mandate. They are designed to prevent the copyright-retention requirement from compromising the deposit requirement (thereby causing the Harvard OA Mandate to fail, as the original NIH policy failed, until its flaws were corrected three years later). First, here is the draft Harvard OA mandate as it now stands. [passages that are flagged for modification are in brackets]: Text of Motion on behalf of the Provost’s Committee on Scholarly Publishing:Now here are the small but crucial changes that will immunize the deposit requirement against any opt-outs from the copyright-retention requirement. Note the re-ordering of the clauses, and the addition of the underscored passages. (Other universities may also omit the two indented clauses preceded by asterisk ** if they wish): Proposed revision:Stevan Harnad American Scientist Open Access Forum Friday, January 25. 20081st DRIVER Summit Report
Highlights from the 1st DRIVER Summit Report:
"On 16 and 17 January 2008, DRIVER II successfully carried out its first Summit in Goettingen, Germany. Approximately 100 invited representatives from the European Community, including representatives of the European Commission, over 20 spokespersons of European repository initiatives as well as experts in different repository related fields from Europe, the U.S., Canada and South Africa came together to discuss their experiences and concrete actions with respect to the further building of cross-country repository infrastructures...From Norbert Lossau's Summary: "The conditions to populate repositories with content and to implement a coherent European and global digital repository based eInfrastructure are more favourable than ever before. The Council's Conclusions on Scientific Information, the European Research Council Open Access mandate and the current preparation of an Open Access mandate for all EC funded research publications can draw from the existing infrastructure efforts which must be accelerated in the coming months... Thursday, January 17. 2008Don't Conflate 3rd Party Fair Use and Course Packs with 1st Party Open Access Provision
Please don't mix up the problem of University Course Packs with Open Access Provision.
Universities have long-standing frustrations about what 3rd-party buy-in content they can and cannot include in Course Packs for student use. That's an old story, and universities should continue to strive to get the best deal they can with AAP publishers for that, as Hofstra, Marquette, and Syracuse Universities have been trying to do. But on no account should this unending saga be conflated with Open Access (OA) provision (by those same three universities). Universities provide OA to their own research output, by self-archiving it in their own OA Institutional Repositories (IRs). That is 1st-party OA content. The connection between the two is this: In its efforts to strike a deal with AAP Publishers for their ("fair") use of 3rd-party content (originating, say, from, Hofstra University), Marquette and Syracuse need not worry about the content that they include in their course packs that consists of links to Hofstra's OA articles, deposited in Hofstra's IR. Nor vice versa. All they need do is link to it. And no need to download, print or photocopy it for with course packs. The link is enough, and the students can do the rest (if they lack affection for trees). And as Green OA self-archiving of 1st party content in authors' OA IRs grows, the frustrations of jostling for 3rd party content will simply shrink and disappear. So, in keeping with the Golden Rule, universities should be mandating Green OA self-archiving, alongside whatever deals they may be cutting for fair use in course packs. Stevan Harnad American Scientist Open Access Forum Wednesday, January 16. 2008Critique of EU Council's Conclusions (again heavily influenced by the publisher anti-OA lobby)
Here is the video of my presentation to the DRIVER Summit:
Institutional Versus Central Deposit: Optimising DRIVER Policy for the OA Mandate and Metric Era Also to be discussed at the DRIVER Summit is this statement by the EU Council (not to be confused with the European Research Council (ERC), which has mandated OA self-archiving!) The EU Council's Conclusions show the tell-tale signs of penetration by the publisher anti-OA lobby; familiar slogans, decisively rebutted many, many times before, crop up verbatim in the EU Council's language, though the Council does not appear to realize that it has allowed itself to become the mouthpiece of these special interests, which are not those of the research community: Council of the European Union: Conclusions on scientific information in the digital age: access, dissemination and preservationHere is my critique of this EU Council statement (all boldface quotes are from the Council's statement, the underscores have been added): "the importance of scientific output resulting from publicly funded research being available on the Internet at no cost to the reader under economically viable circumstances, including delayed open access"(1) 'At no cost to the reader' conflates site-licensing and Open Access (OA). This wording was no doubt urged by the publisher lobby. The focus should be on providing free online access webwide. That is OA, and that makes the objective clear and coherent. (2) 'Delayed open access' refers to publisher embargoes on author self-archiving. If embargoes are to be accommodated, it should be made clear that they apply to the date at which the access to the embargoed document is made OA, not to the date at which the document is deposited, which should be immediately upon acceptance for publication. The DRIVER network of Institutional Repositories (IRs) can then adopt the 'email eprint request' button that will allow individual users to request and receive individual copies of the document semi-automatically. (3) What should be deposited in the author's own institutional IR immediately upon acceptance for publication is the author's peer-reviewed, accepted final draft ('postprint'), not the publisher's PDF (or XML). There are far more publisher embargoes on the PDF/XML than on the postprint, and the postprint is all that is needed for research use and progress. The postprint is a supplementary version of the official publication, provided for OA purposes; it is not the version with the primary digital preservation problem. (4) Digital preservation should not be conflated with OA provision: There is a (separate) problem of the digital preservation of the publisher's PDF/XML, but this is not the same as the problem of providing OA to the author's postprint. The postprint, though it can and should be preserved, is not the canonical copy of the publication, so the two preservation tasks should not be conflated. (5) Self-archiving research data is also a different matter from self-archiving research publications. Data-archiving is not subject to a publisher embargo, and it needs independent preservation, but data-access and data-preservation should not be conflated with OA provision. (6) Deposit should be directly in each author's own IR: Distributed institutional depositing and storage should not be conflated with central harvesting and indexing: Deposit Institutionally, Harvest Centrally. (7) Direct central deposit should be avoided except in cases where the author is institutionally unaffiliated or the author's institution does not yet have an IR. For those cases, there should be at least one provisional default repository such as DEPOT. (8) Research (publications and data) should not be conflated with other forms of digital content. The problems of cultural heritage archiving, for example, are not the same as those of research publication archiving. Nor are the problems of archiving the same as the problem of access-provision (OA). "ensure the long term preservation of scientific information -including publications and data"This is an example of the complete conflation of OA-provision with digital preservation, including a conflation of authors' supplementary postprints with the publisher's original, as well as a conflation of research publications with research data. DRIVER will not have a coherent programme unless it clearly and systematically de-conflates OA-provision from digital preservation, primary publications from authors' supplementary postprints, and publication-archiving from data-archiving, treating each of these separately, on its own respective terms. "experiments on and wide deployment of scientific data infrastructures with cross-border, cross-institution and cross-discipline added-value for open access to and preservation of scientific information"This again conflates OA provision with digital preservation and conflates publications with data. It also conflates both of these with IR interoperability, which is yet another matter. (And webwide OA is, by definition, cross-institution, cross-border and cross-discipline, so that is a non-issue.) What is an issue, however, is institutional versus central depositing, and it is crucial that DRIVER have a clear, coherent policy (insofar as research archiving is concerned -- this does not necessarily apply to other forms of digital content): Deposit Institutionally: Harvest/Index/Search Centrally. The emphasis of DRIVER should accordingly be on ensuring that the distributed IRs have the requisite interoperability for whatever central harvesting, indexing, search and analysis are needed and desired. "promoting, through these policies, access through the internet to the results of publicly financed research, at no cost to the reader, taking into consideration economically sustainable ways of doing this, including delayed open access"Economic sustainability is again a red herring introduced by the publishing lobby into language that should only concern the research community and research access. The economic sustainability of publishing is not DRIVER's concern. DRIVER's concern should be interoperable OA-provision (plus whatever cultural-heritage and other forms of archiving DRIVER wishes to provide the infrastructure for). Nor are publisher access-embargoes DRIVER's concern: DRIVER should merely help ensure immediate deposit in IRs, and it should facilitate research usage needs through IR interoperability as well as the IRs' email eprint request button. "2008 working towards the interoperability of national repositories of scientific information in order to facilitate accessibility and searchability of scientific information beyond national borders"Insofar as research is concerned, it is not the interoperability of national repositories that is crucial but the interoperability of all OA IRs. "2009 contributing to an effective overview of progress at European level, informing the Commission of results and experiences with alternative models for the dissemination of scientific information."This is again a red herring (for both the EU and for DRIVER) introduced by the publishing lobby: Research archiving and OA-provision are neither a matter of alternative publishing models nor a matter of alternatives to the generic peer-reviewed publication model. Publishing reform and peer review reform are not DRIVER matters. They can and will evolve too, but DRIVER should focus on the deposit of current published research as well as research data in IRs, and the interoperability of those IRs. That is the immediate problem. The rest is merely speculative for now. "B. Invitation to the Commission to implement the measures announced in the Communication on "scientific information in the digital age: access, dissemination and preservation", and in particular to: 1. Experiment with open access to scientific publications resulting from projects funded by the EU Research Framework Programmes by: defining and implementing concrete experiments with open access to scientific publications resulting from Community funded research, including with open access."This is a vague way of saying that the publishing lobby has persuaded the EU not to do the obvious, but to keep on 'experimenting' as if what needed to be done were not already evident, already tested, already demonstrated to work, and already being done, worldwide (including by RCUK, ERC, NIH, and over a dozen universities): The EU should mandate that all EU-funded research articles (postprints) are deposited in the fundee's IR immediately upon acceptance for publication. Access can be set in compliance with embargoes, if desired. And data-archiving should be strongly encouraged. DRIVER's concern should be with ensuring that the network of IRs has the requisite interoperability to make it maximally useful and useable for further research progress. THE FEEDER AND THE DRIVER: Deposit Institutionally, Harvest Centrally Stevan Harnad DRIVER is designing an infrastructure for European and Worldwide Open Access research output, stored in institutional and disciplinary repositories, now increasingly under institutional and research-funder mandates. It is critical for DRIVER to explicitly take into account in its design (as some research funders have not yet done, because they have not yet thought it through) that institutional and disciplinary (central) repositories (IRs and CRs), although they are fully interoperable and at a par in that respect, nevertheless play profoundly different roles. Universities and research institutions are the FEEDERS-- the primary providers of research, funded and unfunded, in all disciplines -- for both kinds of repositories (IRs and CRs). This difference in role and function must be concretely reflected in the design of the DRIVER infrastructure. The primary locus of deposit for all research output is the researcher's own institution's IR (except in the increasingly rare case of institutionally unaffiliated researchers). Thanks to OAI-interoperability, the metadata for those deposits, or even the full-text deposits themselves, can also be harvested by (or exported to) any number of CRs -- discipline-based CRs, funder-based CRs, theme-based CRs, national CRs, European CRs, global CRs. Neither IRs nor CRs will fill without deposit mandates. This is a hard lesson, that has been learned very late (NIH, for example, made the mistake of requesting rather than requiring deposit, the NIH policy failed, and three years of research impact was consequently lost); but the lesson has now at long last indeed been learned. So the number of institutional and funder mandates is now set to grow dramatically. Institutions of course always mandate deposit in their own IRs. Many funders have mandated deposit, indicating that deposit can be in either IRs or CRs. But a few funders still stipulate, dysfunctionally, that deposit must be in CRs. This is a symptom of not having thought OA through. Funders are of course greatly to be commended for mandating OA, but their short-sightedness on the question of locus and means of deposit needs correction, and DRIVER can and should help with this, pre-emptively, rather than blindly following the unreflective and incoherent trends in the air today. Indeed DRIVER must take a coherent position, if it wants OA content to be provided and OA repositories to be filled, reliably and fully. The model that DRIVER should adopt in designing its infrastructure is "Deposit Institutionally, Harvest Centrally." That is the way to scale up -- simply, swiftly, systematically and surely -- to 100% OA. I give the reasons in detail in my talk tomorrow, but for now, I just want to point out the principle points: Institutions (i.e., universities and research institutes) are the providers -- the source -- of all research. Institutions have a direct interest in showcasing and managing their own research output, but they have been even more sluggish than funders in adopting mandates. If funders mandate central deposit, they neither cover all of OA output nor do they collaborate coherently with the providers (the institutions) to scale up systematically to providing OA to all of their institutional research output. The OAI protocol makes it possible to harvest content from all OAI-compliant repositories. That is the coherent, systematic pattern of content provision for which DRIVER should be designed, not an incoherent patchwork of arbitrary institutional and central depositing and repositories that will neither scale up to all of OA nor accelerate its attainment. Not all research is funded; not all research fits into defined disciplines; disciplines are not all independent. Disciplines, being overlapping and redundant, would entail that discipline-based depositing had to be be overlapping and redundant. Depositing can be mandated once, but not multiply. The natural way to ensure that a paper is present in multiply loci (institutional, (multi)-disciplinary, national, etc.) is to deposit it at source – i.e., institutionally – and then harvest or import its metadata (or both its metadata and the paper itself) into whatever CRs we decide we need. That is what the OAI interoperability protocol itself was designed for. And, not to put too fine a point on it, the very notion of Central Repositories already betrays something of a misunderstanding of the online medium: Is Google a central repository? Is it a repository at all? Do people deposit directly in Google? OAIster, Citebase (and many other central OAI services like them) are an even better model: OAIster and Citebase were explictly designed to be OAI service-providers -- functional overlays on the distributed OA content-providers. Do CRs -- disciplinary, interdisciplinary, national and international -- really need to be any more than that? Stevan Harnad American Scientist Open Access Forum First DRIVER Summit: Towards a Confederation of Digital Repositories
First DRIVER Summit:
Towards a Confederation of Digital Repositories 16-17 January 2008 Goettingen, Germany DRIVER responds to the vision that all relevant scientific content should be easily accessible through internet-based infrastructures. Achieving this vision reaches beyond technology - it is also the organisational dimension that allows a stable and trusted network of content providers. DRIVER is working with repository federations in Europe, and reaches out to further international communities (for example in the US, China, India, and Africa) in order to determine the practical requirements for a confederation of digital repositories. Wednesday, January 16, 2008 13:00-13:15 Opening Address (Norbert Lossau, DRIVER) 13:15-14:00 Opening Keynote - EC Research Infrastructure & Digital Repository Roadmap (Mario Campolargo, EC) 14:00-14:30 Organising Infrastructures: Experiences from D-Space, Open Content Alliance and other community efforts (Michele Kimpton, D-Space Foundation) Chair: Sylvia van Peteghem, DRIVER 15:00-15:30 DRIV(ER)ing Research Infrastructures (Yannis Ioannidis, DRIVER) 15:30-16:00 Growing Repositories (Bill Hubbard, DRIVER) 16:00-16:30 Repository Landscape - DRIVER Studies (Kasja Weenink, DRIVER) Chair: Sijbolt Noorda, EUA 17:00-18:30 Panel Discussion "Open Access and Repository Infrastructures": Responding to the "Council Conclusions on Scientific Information" Thursday, January 17, 2008 Chair: Rüdiger Klein, ESF (tbc) 09:00-09:30 Institutional versus Central Deposit: Optimising DRIVER Policy for the OA Mandate and Metric Era (Stevan Harnad, Université du Québec à Montréal & University of Southampton ) 09:30-10:00 Challenges in DRIVER-II (Donatella Castelli, DRIVER) Chair: Carlos Morais-Pires, EC 10:00-10:30 Disciplinary Repository Projects EuroVO-AIDA: Astronomical virtual observatories, repositories and data access (Sebastien Deriere, Thomas Boch) METAFOR -- Climate Modelling (Michael Lautenschläger) 11:00-12:30 Disciplinary Repository Projects Genesi-DR -- Earth Science (Luigi Fusco, Donatella Castelli) D4Science -- Environmental Monitoring (Donatella Castelli) eCrystals -- Crystallography (Simon Coles, Liz Lyon) OLAC: The Open Language Archives Community (Gary Simons) Language Resource Management and Discovery in CLARIN (Peter Wittenburg) Nereus/Neeo -- Economists (Vanessa Proudman) Discussion Chair: Wolfram Horstmann, DRIVER 13:30-14:45 National & International Repository Networks EIFL -- Open Access Program (Susan Veldsman, Rima Kupryte) Open Access Repositories in Finland (Rita Voigt) OpenAccess.se (Jan Hagerlid) NORA - The Norwegian Experience (Jan Erik Frantsvåg) eLABa - Lithuanian Academic e-Library (Vilius Kuciukas) Local Integration, National Federation: TARA, TCD-RSS, IReL-Open, Expertise Ireland (Niamh Brennan) OAI Spain (Alicia Lopez-Medina) 14.45-15.50 Discussion: Building a Joint Digital Repository Infrastructure (Moderator: Carlos Morais-Pires, EC; Norbert Lossau, DRIVER) 15.50-16.00 Wrap-up (Norbert Lossau, DRIVER) Sunday, December 23. 2007Deposit Institutionally, Harvest Centrally
University of Michigan’s digital repository now available through PubMed:
"Researchers who find articles by University of Michigan (UM) authors in PubMed can now directly -- and for free -- link to the full text using Deep Blue, UM’s digital repository, via PubMed’s LinkOut feature. Deep Blue is an online archive that preserves and provides access to UM intellectual and creative work. It is the first institutional repository to provide such links." Congratulations to the University of Michigan and PubMed for adding this excellent and timely feature (to both PubMed and Michigan's Institutional Repository [IR], Deep Blue)! But why stop there? The implications are obvious: Central Repositories [CRs] (like PubMed Central and Arxiv and CogPrints) should not be deposited in directly, because that merely complicates and competes with a systematic worldwide policy of depositing all institutional research output in each institution's own, OAI-compliant IR. Institutions are the primary research providers. They have the greatest stake in ensuring that all their own research output is maximally visible, accessible, and usable, thereby maximizing the institution's research impact. Institutions are also the best placed to showcase, monitor and reward the self-archiving of their own research output. All institutions should mandate that all their research article output must be deposited in their own IR. Research funders (like NIH) should also mandate that all the research article output from the research they fund must be deposited in the fundee's own institution's IR. Then CRs like PubMed Central as well as indexers like PubMed (or Thompson ISI or Scopus or Google Scholar) can either link to or harvest from the network of interoperable, OAI-compliant IRs. In this natural way -- "deposit institutionally, harvest centrally" -- all of research output can be systematically made OA. Instead depositing willy-nilly in IRs or CRs will only create confusion and resistance on the part of researchers, who will understandably only wish to do the keystrokes once. IR software can also help with automatic exports to other OAI-compliant sites where desired, as well as with version control. Now that the NIH OA self-archiving mandate is imminent, it is all the more important to reformulate it in a way that will scale systematically to all research output worldwide, in all disciplines, rather than leaving it as one non-scaling special case for NIH-funded biomedical research. And remember that the Web era means distributed content provision and central harvesting, Google-style. It is not, as in paper days, that all the content needs to go in one central physical space. Swan, A., Needham, P., Probets, S., Muir, A., Oppenheim, C., O’Brien, A., Hardy, R., Rowland, F. and Brown, S. (2005) Developing a model for e-prints and open access journal content in UK further and higher education. Learned Publishing 18 (1). pp. 25-40.Stevan Harnad American Scientist Open Access Forum Tuesday, October 16. 2007How Green Open Access Supports Text- and Data-MiningIn "Why Green Open Access does not support text- and data-mining", Peter Murray-Rust wrote: PM-R: "...the first thing to do is to gather a corpus of documents... any other scientist should be able to have access to it. It therefore has to be freely distributable..."Agreed. So far this is just bog-standard OA. If the original documents are self-archived as Green OA postprints in their authors' Institutional Repositories (IRs), your SciBorg robot can harvest them and data-mine them, and make the results freely accessible (but linking back to the postprint in the author's IR whenever the full-text needs to be downloaded). PM-R: "[At SciBorg] we are interested in machines understanding science..."Fine. Let your SciBorg machines harvest the Green OA full-texts and "repurpose" them as they see fit. PM-R: "almost all articles are copyrighted and non-distributable. Publisher Copyright is a major barrier... you can’t just go out and compile a wordlist or whatever as you may infringe copyright or invisible publisher contracts (we found that out the hard way)..."You can't do that if you are harvesting the publisher's proprietary text, but you can certainly do that if you are harvesting the author's Green OA postprints. PM-R: "PDFs are so awful... we have to repurpose them by converting to HTML, XML and so on..."Fine. PM-R: "Now the corpus is annotated. Expert humans go through line by line...It is this annotated corpus which is of most use to the scientific community..."Fine. PM-R: "So suppose I find 50 articles in 50 different repositories, all of which claim to be Green Open Access. I now download them, aggregate them and [SciBorg] repurpose[s] them. What is the likelihood that some publisher will complain? I would guess very high..."Complain about what, and to whom? A Green publisher has endorsed the author's posting of his own Green OA postprint in his own IR, free for all. The postprint is the author's own refereed, revised final draft. Now follow me: Having endorsed the posting of that draft, does anyone imagine that the publisher would have any grounds for objection if the author revised the draft further, making additional corrections and enhancements? Of course not. It's exactly the same thing: the author's Green OA postprint. So what if the author decides to mark it up as XML and add comments? Any grounds for objections? Again, no. Corrections, updates and enhancements of the author's postprint are in complete conformity with posting his postprint. Suppose the author did not do those corrections with his own hands, but had a colleague, graduate student, a secretary, or a hired hand do them for him, and then posted the corrected postprint? Still perfectly fine. Now suppose the author had your SciBorg "repurpose" his postprint: Any difference? None -- except a trivial condition, easily fulfilled, which is that the locus of the enhanced postprint, the URL from which users can download it, should again be the author's IR, not a 3rd-party website (which the publisher could then legitimately regard as a rival publisher -- especially if it was selling access to the "repurposed" text). So the solution is quite obvious and quite trivial: It is fine for the SciBorg harvester to be the locus of the data-mining and enhancement of each Green OA postprint. It can also be the means by which users search and navigate the corpus. But SciBorg must not be the locus from which the user accesses the full-text: The "repurposed" full-text must be parked in the author's own IR, and retrieved from there whenever a user wants to read and download it (rather than just to search and surf the entire corpus via SciBorg). Not only does this all sound silly: it really is silly. In the online age, it makes no functional difference at all where a document is actually physically located, especially if the document is OA! But we are still at the confused interface between the paper age and the OA era. So we have to be prepared to go through a few silly rituals, to forestall any needless fits of apoplexy, which would otherwise mean further dysfunctional delay (for OA). So the ritual is this: It would be highly inimical to the progress of Green OA mandates to insist that the publisher's endorsement to self-archive the postprint in the author's IR is "not enough" -- that the author must also successfully negotiate with the publisher the retention of the right to assign to 3rd-party harvesters like SciBorg the right to publish a "derivative work" derived from the author's postprint. That would definitely be the tail wagging the dog, insofar as OA is concerned, and it would put authors off providing Green OA (and hence put their institutions off mandating it) for a long time to come. Instead, when SciBorg harvests a document from a Green OA IR, SciBorg must make an arrangement with the author that the resultant "repurposed" draft will be deposited by the author in the author's own IR as an update of the postprint. Then, whenever a user of SciBorg wishes to retrieve the "repurposed" draft, the downloading site must always be the author's IR: no direct retrieval from the SciBorg site. This ritual is ridiculous, and of course it is functionally unnecessary, but it is pseudo-juridically necessary, during this imbecilic interregnum, to keep all parties (publishers, lawyers, IP specialists, institutions, authors) calm and happy -- or at least mutely resigned -- about the transition to the optimal and inevitable that is currently taking place. Once it's over, and we have 100% Green OA, all this papyrophrenic horseplay can be well-deservedly dropped for the nonsense it is. Please, Peter, be prepared to adapt SciBorg to the exigencies of this all-important (and all too slow-footed) transitional phase, rather than trying to force-fit the status quo to SciBorg, at the cost of still more delays to OA. PM-R: "Only a rights statement actually on each document would allow us to create a corpus for NLP without fear of being asked to take it down..."No. Green OA authors with standard copyright agreements are not in a position to license republication rights to SciBorg or any other 3rd party. Let us be happy that they have provided Green OA at all, and let SciBorg be the one to adapt to it for now, rather than vice versa. Brody, T., Carr, L., Gingras, Y., Hajjem, C., Harnad, S. and Swan, A. (2007) Incentivizing the Open Access Research Web: Publication-Archiving, Data-Archiving and Scientometrics. CTWatch Quarterly 3(3). Stevan Harnad American Scientist Open Access Forum
« previous page
(Page 7 of 9, totaling 85 entries)
» next page
|
QuicksearchSyndicate This BlogMaterials You Are Invited To Use To Promote OA Self-Archiving:
Videos:
The American Scientist Open Access Forum has been chronicling and often directing the course of progress in providing Open Access to Universities' Peer-Reviewed Research Articles since its inception in the US in 1998 by the American Scientist, published by the Sigma Xi Society. The Forum is largely for policy-makers at universities, research institutions and research funding agencies worldwide who are interested in institutional Open Acess Provision policy. (It is not a general discussion group for serials, pricing or publishing issues: it is specifically focussed on institutional Open Acess policy.)
You can sign on to the Forum here.
ArchivesCalendar
CategoriesBlog AdministrationStatisticsLast entry: 2018-09-14 13:27
1129 entries written
238 comments have been made
Top ReferrersSyndicate This Blog |