Friday, October 8. 2010Cornell, Arxiv and Institutional vs. Central Repositories
On Thu, 7 Oct 2010, Joseph Esposito wrote (in liblicense):
JE:It varies by field. In HEP and Astro, most published journal articles are also self-archived in Arxiv. To understand the meaning of this, however, it is important to note that extremely few papers that are self-archived in Arxiv are not (eventually) published in journals: Arxiv is an access-provider -- to published and pre-publication research papers. Arxiv is not a publisher: Arxiv neither peer-reviews its contents, nor does it certify that they have been peer-reviewed; the publisher does that. Hence, like all open access repositories, Arxiv is a supplement to publication, not a substitute for it. JE:No one wants Arxiv to disappear, but I'll bet that within a decade or sooner Arxiv will just be another automated central harvester of distributed local deposits from authors' own institutional repositories (IRs), not a central locus of direct, institution-external deposit. In the age of IRs, it is no longer necessary -- nor does it make sense -- for authors to self-archive institution-externally. It is also a needless central expense to manage deposit centrally. It makes much more sense to deposit institutionally and harvest centrally. JE:Once all universities have IRs and IR self-archiving mandates, there will be no need to fund repositories for institution-external deposit. Harvesting is cheap. And each university's IR will be a standard part of its online infrastructure. JE:The IR cost per paper deposited will be closer to 50c than $50, once all universities are hosting their own output, and mandating that it be deposited. JE:Guess again! Once the burden of hosting, access-provision and archiving is offloaded onto each author's institution, the only service that journals will need to provide is peer review, and hence journals will be charging institutions a lot less than they are charging now. (Print editions as well as online editions and their costs will be gone too.) On Fri, Oct 8, 2010 at 12:57 AM, Simeon Warner wrote (in jisc-repositories): Simeon, I can only repeat the premise under which that prediction is made:SH:SW: Cornell has not mandated deposit, and it is far from hosting all of its annual output. Ditto for all but about 100 universities so far worldwide. (Not to mention that Cornell and many other universities may not have picked the optimal free IR software solution either ;>) ...) SW:Yes, you have significant scale. But, for Arxiv, it is Cornell, a federal grant, plus funds from some universities that are paying for all the deposits, from all universities, in that one central repository. To repeat: The sensible solution (and probably the only practical, affordable, sustainable one) is for Arxiv -- and any other central archives like it in other fields -- to harvest their respective content automatically from Institutional Repositories that host their own research output. (Institutions, after all, are the universal providers of all that content.) The annual cost per paper deposited will be far less for an Institutional Repository -- hosting only its own research output -- once the institutions are indeed hosting all of their own annual research output -- and not just a small fragment of it, as now. Most institutions today have IRs that are still near-empty rather than at full capacity (as far as OA's target content is concerned). (The cost/benefit of universities hosting their own grey literature output and other kinds of content they generate is another matter, but not to be reckoned into this comparison with Arxiv regarding per-article cost. IRs can archive lots of kinds of things, including departmental reports or family photo albums, if desired...) And Cornell, of course, has the double burden of hosting a near-empty, unmandated IR for its own refereed research output, plus the (partial) expense of hosting Arxiv for the rest of the world! See: Why Cornell's Institutional Repository Is Near-Empty More: http://bit.ly/MoreOnCornellPolicy SW:There are many valid reasons for institutions creating and supporting their IRs -- but only if they mandate that they be filled with their target content. Among those many valid reasons are economic ones: ABSTRACT: Among the many important implications of Houghton et al’s (2009) timely and illuminating JISC analysis of the costs and benefits of providing free online access (“Open Access,” OA) to peer-reviewed scholarly and scientific journal articles one stands out as particularly compelling: It would yield a forty-fold benefit/cost ratio if the world’s peer-reviewed research were all self-archived by its authors so as to make it OA. There are many assumptions and estimates underlying Houghton et al’s modelling and analyses, but they are for the most part very reasonable and even conservative. This makes their strongest practical implication particularly striking: The 40-fold benefit/cost ratio of providing Green OA is an order of magnitude greater than all the other potential combinations of alternatives to the status quo analyzed and compared by Houghton et al. This outcome is all the more significant in light of the fact that self-archiving already rests entirely in the hands of the research community (researchers, their institutions and their funders), whereas OA publishing depends on the publishing community. Perhaps most remarkable is the fact that this outcome emerged from studies that approached the problem primarily from the standpoint of the economics of publication rather than the economics of research. SW:Arxiv is a repository for articles that have been or will be refereed and published by journals. There is an "author pays" model for paying for that refereeing and publishing through author/institution publication fees (for OA journals, and a subscription model for non-OA journals, which are still the vast majority). -- But there is not, never was, and never need be an "author pays" model merely to pay for the deposit of the author's draft of those same articles. Arxiv is a repository, providing access, not a publisher of refereed research. It is the many different journals in which Arxiv's depositors publish who are still the ones doing the refereeing and the publishing (i.e., implementing the peer review process and certifying the outcome, if successful, as having met that journal's established quality standards). And journals need to recover the costs of providing that essential service, either via journal subscriptions tolls or via "author pays" (i.e., article publication fees) On this you are entirely right, Simeon (though I think the term "overlay journals" is a misdescription of what may eventually come to pass, once all refereed, published articles are being self-archived in their author's IR).SH:SW: (And Cornell is aiding and abetting the very trend you mention, by agreeing pre-emptively to subsidize "author pays" costs for (some of) Cornell authors' articles while failing to mandate self-archiving of all of Cornell authors' articles, cost-free!) See: http://bit.ly/PreemptiveCOPEandSCOAP3 Harnad, S. (2009) The PostGutenberg Open Access Journal. In: Cope, B. & Phillips, A (Eds.) The Future of the Academic Journal. Chandos. SW:Many of the necessary tools are not needed at the individual IR level, because search takes place at the harvester level. What IRs lack is not tools, but content. Once we have the OA's target content (refereed journal articles), developing the tools is a piece of cake. SW:We can cross that bridge when we get to it -- if Google Scholar does not cross it for us -- once the target content is indeed being deposited in the IRs, globally -- because deposit has been universally mandated at long last. Stevan Harnad American Scientist Open Access Forum Thursday, September 16. 2010Estimating Japan's Annual Rate of Journal Article Self-ArchivingCongratulations to Japan's JAIRO for harvesting the 700,000 full-texts (out of one million total) self-archived in Japan's 158 Institutional Repositories since 2007. To understand what this figure means, however, the fundamental question is whether or not it represents an increase over the worldwide baseline average for spontaneous (i.e. unmandated) self-archiving, which varies between 5-25% of the total annual output of the primary target content of the Open Access movement: the 2.5 million articles per year published in the planet's 25,000 peer-reviewed journals across all disciplines and languages. Of JAIRO's 700K full-text total, about 110K (15.5%) consisted of journal articles, based on JAIRO's statistical data. From the growth chart (if I have interpreted it correctly), about 75% of 50,000 articles (i.e., 35,000 full-texts) were deposited in 2009. If we can assume that those deposits were all articles published within that same year (or the preceding one), then the question is: What percentage of Japan's (or of those 158 institutions') annual portion of the 2.5 million articles published yearly worldwide do these 35,000 full-texts represent? Does it exceed the worldwide unmandated baseline of 5-25%? The reason I raise this question is because absolute figures -- even absolute growth rates across years -- are not meaningful in themselves. They are only meaningful if expressed as the percentage of total annual output. For a single institutional repository, this means the percentage of that institution's annual output of refereed journal articles. For Japan's 158 institutional repositories, it means the percentage of the total annual output of those 158 institutions. On the conservative assumption that research-active universities publish at least 1000 refereed journal articles per year, the estimate would be that those 35K articles represent at most about 22% of those institutions' annual refereed journal article output, which falls within the global 5-25% unmandated baseline. The reason I stress this point is that it is important that we do not content ourselves with absolute self-archiving totals and growth rates that look sizeable considered in isolation. The figure to beat is the unmandated baseline of 5-25%, and the only institutions that consistently beat it are those that mandate self-archiving. Their deposit rates jump to 60% and approach 100% within a few years. There are already 170 self-archiving mandates worldwide registered in ROARMAP -- 96 institutional, 24 departmental and 46 funder mandates -- but alas none yet from Japan. If there are any, it would be very helpful if they would be registered in ROARMAP. Also, although Japan has at least 158 repositories, only 77 of them are registered in ROAR: It would be very helpful if the rest were registered in ROAR too... Björk B-C, Welling P, Laakso M, Majlender P, Hedlund T, et al. (2010) Open Access to the Scientific Journal Literature: Situation 2009. PLOS ONE 5(6): e11273. Gargouri, Y., Hajjem, C., Lariviere, V., Gingras, Y., Brody, T., Carr, L. and Harnad, S. (2010) Self-Selected or Mandated, Open Access Increases Citation Impact for Higher Quality Research. PLOS ONE (in press) Harnad, S, (2008) Estimating Annual Growth in OA Repository Content. Open Access Archivangelism. August 9 2008 Sale, Arthur (2006) Researchers and institutional repositories, in Jacobs, Neil, Eds. Open Access: Key Strategic, Technical and Economic Aspects, chapter 9, pages 87-100. Chandos Publishing (Oxford) Limited. Sale, A. (2006) The Impact of Mandatory Policies on ETD Acquisition. D-Lib Magazine April 2006, 12(4). Sale, A. (2006) Comparison of content policies for institutional repositories in Australia. First Monday, 11(4), April 2006. Sale, A. (2006) The acquisition of open access research articles. First Monday, 11(9), October 2006. Sale, A. (2007) The Patchwork Mandate D-Lib Magazine 13 1/2 January/February Stevan Harnad American Scientist Open Access Forum Saturday, August 28. 2010Testing Jan Velterop's Hunch About Green and Gold Open AccessComparing Green and Gold Yassine Gargouri & Stevan Harnad Cognition/Communication Laboratory Cognitive Sciences Institute Universitè du Québec à Montréal Jan Velterop has posted his hunch that of the overall percentage of articles published annually today most will prove to be Gold OA journal articles, once one separates from the articles that are classified as self-archived Green OA those of them that also happen to be published in Gold OA journals: JV: “Is anyone… aware of credible research that shows how many articles (in the last 5 years, say), outside physics and the Arxiv preprint servers, have been made available with OA exclusively via 'green' archiving in repositories, and how many were made available with OA directly ('gold') by the publishers (author-side paid or not)?The results turn out to go strongly contrary to Velterop’s hunch. Our ongoing project is comparing citation counts for mandated Green OA articles with those for non-mandated Green OA articles, all published in journals indexed by the Thompson/Reuters ISI database (science and social-science/humanities). (We use only the ISI-indexed sample because the citation counts for our comparisons between OA and non-OA are all derived from ISI.) The four mandated institutions were Southampton University (ECS), Minho, Queensland University of Technology and CERN. Out of our total set of 11,801 mandated, self-archived OA articles, we first set aside all those (279) articles that had been published in Gold OA journals (i.e., the journals in the DOAJ-indexed subset of ISI-indexed journals) because we were primarily interested in testing the OA citation advantage, which is based on comparing the citation counts of OA articles versus non-OA articles published in the same journal and year. (This can only be done in non-OA journals, because OA journals have no non-OA articles.) This left only the Green OA articles published in non-Gold journals. We then extracted, as control articles for each article in this purely Green OA subset, 10 keyword-matched articles published in the same journal and year. The total number of articles in this control sample for the years 2002-2008 was 41,755. (Our preprint for PloS, Gargouri et al. 2010, covers a somewhat smaller, earlier period: 2002-2006, with 20,982 control articles.) Next we used a robot to check what percentage of these unmandated control articles was OA (freely accessible on the web). Of our total set of 11,801 mandated, self-archived articles, 279 articles (2.4%) had been published in the 63 Gold OA journals (2.6%) among the 2,391 ISI-indexed journals in which the authors from our four mandated institutions had published in 2002-2008. Both these estimates of percent Gold OA are about half as big as the total 5% proportion for Gold OA journals among all ISI-indexed journals (active in the past 10 years). To be conservative, we can use the higher figure of 5% as a first estimate of the Gold OA contribution to total OA among all ISI-indexed journals. Now, in our sample, we find that out of the total number of articles published in ISI-indexed journals by authors from our four mandated institutions between 2002-2008 (11,801 articles), about 65.6% of them (7,736 articles) had indeed been made Green OA through self-archiving by their authors, as mandated (7,457 or 63.2% Green only, and 279 or 2.4% both Green and Gold). In contrast, for our 42,395 keyword-matched, non-mandated control articles, the percentage OA was 23.4% (21.9% Green and 1.5% Gold). Björk et al’s (2010) corresponding finding [Table 3] for their ISI sample (1282 articles for 2008 alone, calculated in 2009), was 20.6% total OA (14% Green plus 6.6% Gold). (For an extended sample that also included non-ISI journals it was 11.9% Green plus 8.5% Gold.) The variance is probably due to different discipline blends in the samples (see Björk et al's Figure 4, where Gold exceeds Green in bio-medicine), but whichever overall results one chooses – whether our 21.9% Green and 1.5% Gold or Björk et al’s 14% Gold and 6.6% Green (or even their extended 11.9% Green and 8.5% Gold), the figures fail to bear out Velterop’s hunch that: “publishers (the 'gold' road) have actually done more to bring OA about than repositories, even where mandated (the 'green' road).”Moreover (and this is really the most important point of all), Velterop's hunch is the wrongest of all precisely where OA is mandated, for there the percent Green is over 60%, and headed toward 100%. That is the real power of Green OA mandates. Gargouri, Y., Hajjem, C., Lariviere, V., Gingras, Y., Brody, T., Carr, L. and Harnad, S. (2010) Self-Selected or Mandated, Open Access Increases Citation Impact for Higher Quality Research. PLOS ONE 10(5) Björk B-C, Welling P, Laakso M, Majlender P, Hedlund T, et al. (2010) Open Access to the Scientific Journal Literature: Situation 2009. PLOS ONE 5(6): e11273. Friday, August 13. 2010Authors' Drafts, Publishers' Versions-of-Record, Digital Preservation, Open Access and Institutional Repositories
Commentary on Richard Poynder's
"Preserving the Scholarly Record: Interview with digital preservation specialist Neil Beagrie" The trouble with universities (or nations) treating digital preservation (which is a genuine problem, and a genuine responsibility) as a single generic problem -- covering all the university's (or nation's) "digital output," whether published or unpublished, OA or non-OA -- is not only that adding an additional preservation cost and burden where it is not yet needed (by conflating Green OA self-archiving mandates with "preservation mandates" and their funding demands) makes it even harder to get a Green OA self-archiving mandate adopted at all. But taking an indiscriminate, scattershot approach to the preservation problem also disserves the digital preservation agenda itself. As usual, what is needed is to sort out and understand the actual contingencies, and then to implement the priorities, clearly and explicitly, in the requisite causal order. The priorities here are to focus university (or national) preservation efforts and funds on what needs to be preserved today. And -- as far as universities' own institutional repositories (IRs) are concerned -- that does not include the publisher's official version-of-record for that university's (or nation's) journal article output. Preserving those versions-of-record is a matter to be worked out among deposit libraries and the publishers and institutional subscribers of the journals in question. Each university's own IR is for providing OA to its own authors' final, refereed drafts of those articles, in order to make them accessible to those users worldwide who do not have subscription access to the version-of-record. The author's draft does indeed need preservation too, but that's not the same preservation problem as the problem of preserving the published version-of-record (nor is it the same document!). Perhaps one day universal Green OA mandates will cause journal subscriptions to become unsustainable, because the worldwide users of journal articles will be fully satisfied with just the author's final drafts rather than needing the publisher's version-of-record, and hence journal subscriptions will be cancelled. If and when we ever reach that point, the version-of-record will no longer be produced by the publisher, because the authors' drafts will effectively become the version-of-record. Journal publishers will then convert to Gold OA publishing, with what remains of the cost of publication paid for by institutions, per individual article published, out of their windfall subscription cancellation savings. (Some of those savings can then also be devoted to digital preservation of the institutional version-of-record.) But conflating the (nonexistent) need to pay for this hypothetical future contingency today (when we still have next to no OA or OA mandates, and subscriptions are still going strong) with either universities' (or nations') digital preservation agenda or their OA IR agenda is not only incoherent but counterproductive. Let's keep the agendas distinct: IRs can archive many different kinds of content. Let's work to preserve all IR content, of course, but let's not mistake that IR preservation function for journal article preservation or OA. For journal articles, worry about preserving the version-of-record -- and that has nothing to do with what is being deposited in IRs today.Nor should the need to mandate depositing the author's version be in any way hamstrung with extra expenses that concern the publish's version-of-record, or the university's IR, or OA. (Exactly the same thing is true, mutatis mutandis, at the national preservation level, insofar as journal articles are concerned: A journal's contents do not all come from one institution, nor from one nation.) And, while we're at it, let's also keep university (or national) funding of Gold OA publishing costs distinct from the Green OA mandating agenda too. First things first. Needlessly over-reaching (for Gold OA funds or preservation funds) simply delays getting what is already fully within universities' (and nations') grasps -- which is the newfound (but mostly unused) potential to provide OA to the authors' drafts of all their refereed journal articles by requiring them to be deposited in their OA IRs (not by reforming journal publishing, nor by solving the digital preservation problem). Stevan Harnad American Scientist Open Access Forum Tuesday, July 27. 2010The Mandate of Open Access Institutional Repository ManagersIn a UKSG Serials News posting, "Are we nearly there yet? On the road to open access",Graham Stone [GS], Repository Manager, University of Huddersfield and Chair, UK Council of Research Repositories (UKCoRR) wrote: GS: "Not too long ago, I took a phone call from an academic colleague from the Health Sciences regarding the submission of an article to Biomed Central. [The colleague] phoned me as I am the 'Repository guy' and [the colleague was] learning to play the 'Repository game', that is getting their work out there on open access and increasing their citations. [The colleague was] very impressed that so many people downloaded their last paper within days of it appearing in the Repository."This upbeat-sounding paragraph is unfortunately a series of (familiar) misunderstandings and non-sequiturs about Open Access (OA) and Institutional Repositories (IRs): (1) Biomed Central (BMC) is a gold OA (pay-to-publish) journal publisher. (2) Publishing in a BMC journal has nothing to do with depositing an article in "the Repository." Which Repository -- Huddersfield's? You don't need to publish in a pay-to-publish gold OA journal in order to deposit in a green OA Institutional Repository (IR) like Huddersfield's, nor in order to benefit from the increased downloads and citations that OA makes possible. All you do is publish in whatever journal you publish in, and deposit the final refereed draft in your OA IR as soon as it is accepted for publication. Or was the deposit in PubMed Central (PMC, not BMC)? Likewise no payment required (but what does deposit in that institution-external repository have to do with U. Huddersfield's IR, or its IR manager?). (3) There is no "Repository game". There is just the research and publication game. (Providing OA maximizes research access, usage and impact, and OA can be provided in two ways. I. "Gold OA": by publishing in an OA journal (of which the major ones require payment to publish); or II. "Green OA": by publishing in any journal at all -- whether subscription-based or OA -- and also depositing the final draft in your OA IR: no payment required. The "game" is merely ensuring that all potential users have online access to your published articles, not just those whose institutions can afford to subscribe to the journal in which it happened to be published.) GS: "It struck me as very interesting that to [this colleague], the next stage of the 'game' was to consider switching from green to gold open access - providing someone would pay of course!"The colleague sounds like a researcher who has just deposited an article for the first time in an OA repository (perhaps PMC, though it should have been Huddersfield's IR), and not a researcher who has just paid BMC for gold OA publication (otherwise the colleague would know who was paying!). Something has definitely been garbled here... GS: "This is not the first time that this topic has come up in conversation in the past few weeks. At the recent LIBER conference at Aarhus University in Denmark discussion over dinner turned to open access. One comment from a colleague was that green open access could not be successful in the long run as this was a compromise, and 'compromises never work'."How is providing OA to one's published article by depositing it in one's IR a "compromise"? A compromise of what, with what, for whom? Depositing an article in an IR consists of a few minutes' worth of keystrokes that maximize the access, usage and impact of one's article. But perhaps the LIBER discussion was not among (1) researchers, discussing the problem of how to "get their work out there on open access and increase their citations" rather than continue to allow access to it to be restricted only to those researchers whose institutions can afford to pay for subscription access to the journal in which it happens to be published... Perhaps the LIBER discussion was instead among (2) librarians, discussing the problem of how to afford to pay for subscription access? Or perhaps the LIBER discussion was among (3) publishers, discussing the problem of how to guarantee current subscription revenue streams in a growing climate of demand for open access on the part of researchers, their institutions, their funders, and the tax-paying public that funds the research? To repeat: In what sense is green OA self-archiving a "compromise"? A compromise of what, with what, for whom? Is a university repository manager a representative of the immediate interests of the university's researchers (and their institutions, funders, and the tax-paying public that funds the research), or of the interests of publishers and their present and future business models? If librarians are to fulfill the role of repository managers, they need to re-think what they are doing, and why, and what it is that researchers and research need in the OA era. An OA IR is not a buy-in collection of journal subscriptions: It is a give-away provision of access to an institution's published journal articles. An OA IR manager is not a serials librarian, nor someone appointed to direct or second-guess the future course of serials publishing. An OA IR manager is someone appointed to make sure the university's OA IR is filled with its primary target content: the university's published journal article output. "UKCoRR has a vision of the work of repository management as a professionally recognised and supported role within UK research institutions." -- What is that "professionally recognised and supported role" if it is not filling their institution's repository with its intended content? GS: "The road to open access is covered in gold and this is the way forward."The way forward for whom? And according to whom? And in the interests of what? Researchers can be mandated to provide green OA for their published work. (Without mandates, only about 20% or articles are self-archived.) And funds -- if any are available -- can be provided to pay for gold OA. But publishers cannot be mandated to provide gold OA. And the funds to pay for gold OA cannot be mandated while they are still tied up in paying for subscriptions (and while the asking price for gold OA is designed to preserve publishers' current revenue streams and modus operandi, come what may). The road to green OA is wide open, and traversing it is entirely in the hands of researchers (and their institutions and funders). The road to gold OA is not wide open; it costs money, and it is in the hands of publishers, not researchers. And the potential money to pay for gold OA is currently tied up in institutions' subscription fees, which are being paid to publishers, by institutions' libraries. So how is the road to OA covered with gold, and how is it the way forward? And what has this to do with the research repository manager's "professionally recognised and supported role within UK research institutions"? GS: "A few days earlier, Kurt de Belder from Leiden University in the Netherlands had laid out his vision of the future, which assumed that open access would be via the gold route and if Repositories existed, they would only contain grey literature."Kurt de Melder is the director of Leiden University's library (and an advisor to several publishers). Does his golden vision (like the green vision) include a practical means (like the green vision's mandates) of getting us from here to there? Or is it all just a golden wish, waiting passively (apart from any spare money being spent on pre-emptive gold OA payments) for publishers to convert to gold and release everyone's subscription money (for incoming journals) to pay their asking price for gold OA (for outgoing articles)? And while the institution's library keeps waiting for this to happen directly, of its own accord, is the access, usage and impact of the institution's research output to continue to be denied to all but subscribing institutions, as it is today, while institutions' IRs (which already exist, by the way) are devoted instead to "grey literature" (whatever that means) instead of to refereed research (green OA)? And meanwhile, visions aside, those who have their eyes wide open cannot help but notice that IRs (which already do exist, remember) do contain green content (20%) rather than just grey content, and that green deposit mandates can and do drive up the percentage green from the baseline 20% to 60%, and approaching 100% within a few years. What's missing, and needed (for those with eyes wide open to see) is more green OA mandates from institutions and funders -- not armchair or dinner-table visions of the future of publishing, evoked in the thrall of pre-emptive gold fever (with no critical reflection on or answerability to practical means and ends). That, perhaps (rather than gold fever), would come closer to a substantive "vision of the work of repository management as a professionally recognised and supported role within UK research institutions." GS: "Personally, and not as Chair of UKCoRR (UK Council of Research Repositories), I must admit that I am starting to agree with the gold only route, although I'm not sure I should."If the Chair of UK's Council of Research Repositories is starting to agree (whether personally or ex officio) with the gold-only route, then perhaps it is time for the Chair to think of resigning, and allowing UKCoRR's direction to be set by those who understand the needs of research and researchers, the power of green OA IRs, and the urgent need for Green OA mandates. Surely there is a "UK Council of Publishing Business Models" that could be joined instead, by those who have become afflicted with gold fever, forgetting about research and researchers' urgent immediate need for OA, and IRs' mission to provide it. GS: "I have been espousing the virtues of green open access for nearly five years. At Huddersfield we have 26% full text in the Repository despite not yet having a mandate and our full text downloads are really taking off - 46,000 in the last 12 months."If that 26% is 26% of Huddersfield's current yearly research output, then that deposit rate is somewhat above the global spontaneous (i.e., unmandated) baseline deposit rate of about 20%, but it is a far cry from what the deposit rate would be if Huddersfield were to adopt a mandate. A repository manager espousing the interests of Huddersfield's researchers should be espousing the virtues of green OA mandates to Huddersfield's researchers and administration, not just the virtues of providing green OA spontaneously (although that is, of course, welcome too). Well over five years' consistent experience (and surveys) worldwide have shown that most researchers will not deposit spontaneously but they will deposit (willingly) if deposit is mandated. In the past few years, it is not spontaneous deposit rates that have been picking up, but the rate of adoption of deposit mandates, and the resulting green OA. This is not the time for repository managers to succumb to gold fever (which leads next to nowhere, and is not even part of their remit), resigning their IRs to warehousing "grey literature." GS: "However, for some time I have had my doubts as to whether the championing of green open access was actually taking us down the right road. I could see that gold open access was a good business model. "If we all commit to deposit, we don't need green OA self-archiving mandates. But we don't all commit to deposit, even though it costs nothing. Only about 20% commit unmandated (26% at Huddersfield, perhaps because the IR manager has for five years espoused the virtues of spontaneous deposit so persuasively). But even fewer commit to gold OA, because it costs money, because most of the top journals don't offer it, and because the money to pay for it is still tied up in paying for subscriptions. And there are no mandates to require researchers to pay for gold OA, nor to release the subscription money, nor to dictate publishers' business model or modus operandi, nor to set their asking price. Besides, none of that is within an OA IR manager's remit. It has nothing to do with "the work of repository management as a professionally recognised and supported role within UK research institutions." An OA IR manager is supposed to get his IR filled with OA's target content, and that target content is supposed to be, first and foremost, peer-reviewed journal articles, most of which are today still being published in subscription journals. What needs to be championed by IR managers (and a fortiori, by the Chair of the UK Council of Research Repositories), and championed for their researchers and their institutions, are the virtues of green OA mandates that will fill their IRs -- not the virtues of "good business models," championed for publishers, by librarians. (You don't need to be a "professional and supported" IR manager to go down that road.) And those who are indeed committed to championing green OA mandates worldwide are beginning to win them. GS: "The trouble to me is that the [gold OA] model only really works if we all commit. Otherwise, you end up paying twice, once for the open access article and once for the journal subscription. I just didn't see how we arrived at this brave new world of gold open access journals, no serials budgets and stuff in the cloud."Yes, that's indeed the size of it: "The [gold OA] model only really works if we all commit. Otherwise, you end up paying twice, once for the open access article and once for the journal subscription." Trying to go directly from the status quo to gold OA is quite simply self-contradictory, like an Escher drawing of an impossible shape: Institutional subscription access tolls are paid per incoming journal; individual OA publication fees are paid per outgoing article. The money to pay for gold OA fees is tied up in subscription tolls. But institutions cannot cancel their journal subscriptions unless the journals' contents are accessible to their users otherwise. Institutions are not necessarily even subscribing annually, for their users, to the same journals in which their researchers are occasionally publishing. Catch 22. (And, as Graham notes, anyone foolish and gullible enough to believe hybrid gold publishers (the ones who charge both subscription tolls + optional gold OA fees) when they say they will reduce subscription tolls proportionately as gold OA fee revenues increase is forgetting that this requires institutions to find the money to pay the gold asking price first, while it is still being spent on the subscriptions! A good "business model" indeed…) (By the way, the somewhat uneven distribution of wealth on the planet can also be fixed "if we all commit." That's not just gold fever, it's the Golden Rule -- but alas far too few in our gene pool are committed to practising it...) GS: "But maybe I can see how we get to gold open access now? With researchers taking ownership of the 'game' by realising that gold open access is the only way to ensure access for all and increased citations, maybe we are on the right road after all?"Researchers "taking ownership of the 'game'"? by "reaising that gold OA is the only way"? The self-contradiction on the road to there from here is resolved by "realisation"? By researchers? (The same researchers for whom the only thing they need to do to provide OA is a few keystrokes? And they're not even "committed" enough to do those keystrokes, unless they are first mandated by their institutions or funders?) What does this vision envision that researchers are to do with this newfound golden realisation of theirs? The same thing 34,000 of them did (unsuccessfully) back in 2000? Sign a petition to boycott their journals if they don't go OA? And if researchers were really that committed to "ensuring access for all and increased citations," wouldn't it be simpler than making empty threats against all their publishers just to petition their one and only institution to mandate deposit? Better still, if their realisation about "the only way" were that profound, wouldn't researchers just go ahead and do the keystrokes to deposit of their own accord, unmandated, in order to "ensure access for all, and increased citations"? And would it not be a remarkable coincidence it it turned out that the most pressing thing on researchers' minds was not, in fact, the access and impact of their work (which they can already maximize with a few green keystrokes), but a "good business model" for their publishers and their long-suffering librarians? A remarkable coincidence that what researchers had been yearning for all along turned out (upon "realisation") to be exactly the same thing their librarians had been yearning for -- which was not the filling of their OA IRs but relief from the serials crisis? GS: "And maybe, instead of the superfast highway to gold open access that some envisage, are we travelling down the leafy lane of green open access with gold just around the next corner? A bit round the houses, but yes we are certainly getting there."The super-fast highway to gold OA? Amidst all this "realisation," I don't recall hearing the game plan for solving the problem of the toll booths posted along the ubiquitous subscription highways -- the ones that are currently gobbling up institutions' serial budgets (i.e., the funds that would be used instead to pay for gold OA)... But it is true that green OA, once it becomes universal, may eventually get us to gold OA too -- if universal availability of green eventually causes universal cancellations, forcing journals to cut costs, downsize, and convert to gold OA, thereby releasing the windfall subscription savings to pay the reduced cost of gold OA (peer review alone, with the print and online editions gone, and all access-provision and archiving offloaded onto the worldwide network of OA IRs). But that's not around the next corner, when we're still at 20% green OA. And we are certainly getting ahead of ourselves, if we don't provide the universal green OA first -- for that's what any eventual subscription cancellation windfall is dependent upon. The cancellations can't be done pre-emptively. Certainly not by a single institution, or IR manager -- not even the Chair of the UK Council of Research Repositories. That would require universal institutional subscription cancellations, and all at once (not one institution or country at a time -- otherwise the researchers of that institution or country, instead of gaining open access, lose subscription access altogether). My recommendation to OA IR managers who envision "the work of repository management as a professionally recognised and supported role within UK research institutions" would be to focus on their own mandate, which is to fill their own institution's IRs, not to dream about business models that are as good as gold. And the way to get their OA IRs filled is already known: It is by getting their institutions to mandate green OA. (No one connected in any way with OA IRs has a more "professionally recognised and supported role within [their] research institutions" then Southampton's Les Carr and Harvard's Stuart Shieber, the architects of their respective institutions' green OA mandates (Southampton's being the first and Harvard's the most famous). It's not too late for Huddersfield -- or Nottingham, or the rest of the 17,000 universities that have not yet adopted a mandate. That's all. And that's enough. Mandate green OA for your institution and rest will take care of itself, in its own time. But meanwhile your institution's researchers will "ensure access for all, and increased citations." That, after all -- not "a good business model" -- is the purpose of OA, and hence the mandate of OA IR managers. See "Waiting for Gold" On 2010-07-30, at 2:50 AM, Charles Oppenheim [CO] wrote in JISC-Repositories: CO: "Mr Stone's (and other repository managers') Job Specifications may say something like "your job is to ensure that articles produced by staff in this University are made OA, whether by means of the Institutional Repository or by any other means deemed appropriate." So, whilst not disagreeing with the argument that the priority should be green repositories, repository managers should not ignore alternative approaches that also produce increased downloads and citations and promote the institution's reputation. Even if their job specification is tied to their IR, it would be an unprofessional Repository Manager who was not interested in the pros and cons of alternative methods for achieving OA. Being professional means taking a holistic view of things! I see nothing incompatible therefore between Mr Stone's remarks and being chairman of UKCoRR."But GS had written: And CO has replied:GS: "I have been espousing the virtues of green open access for nearly five years… However, for some time I have had my doubts as to whether the championing of green open access was actually taking us down the right road… Kurt de Belder... assumed that open access would be via the gold route and if Repositories existed, they would only contain grey literature… I must admit that I am starting to agree with the gold only route…" If the university repository manager's "job is to ensure that articles produced by staff in this University are made OA, whether by means of the Institutional Repository or by any other means deemed appropriate," it is not clear why the job is called "repository manager."CO: "...priority should be given to green repositories..." (It sounds like something more like "publication advisor" -- and if that advice is to take the gold only route, then it sounds like an anti-repository manager!) Rather than twist simple and obvious job descriptions into complicated ideological knots, might it not be more sensible to look carefully at the concrete, practical reasons why repository managers' "priority should be [filling] green repositories" rather than "the gold only route"? After all, GS himself wrote that the "trouble to me is that the [gold OA] model only really works if we all commit. Otherwise, you end up paying twice." But GS never went on to explain how to surmount this impasse (whereas my posting [above] explains quite explicitly why you could not -- unless universal green OA came first). Yet this impasse did not seem to deter Huddersfield's green repository manager and UKCoRR's chairman from announcing that he was "starting to agree with the gold only route" because he "could see that gold open access was a good business model." CO: "And before Stevan explodes at this posting, let me say (yet again) that I am a strong supporter of the green approach to OA. But I am not blind to the existence, and in some cases success, of alternative OA approaches."Indisputably there is not one but two ways to provide OA. (We -- CO and 8 other co-authors -- defined the two ways ourselves in a Nature Web Focus six years ago: But from the capability of providing OA to some of the planet's annual 2.5 million refereed journal articles in two different ways, green and gold, it does not follow that each of the ways is capable of scaling up to providing OA to all (or even much or most) of the planet's annual 2.5 million refereed journal articles.Harnad, S., Brody, T., Vallieres, F., Carr, L., Hitchcock, S., Gingras, Y, Oppenheim, C., Stamerjohanns, H., & Hilf, E. (2004) The green and the gold roads to Open Access. Nature Web Focus. This is where the sticky Escherian details (about annual percentage green and gold OA, ongoing subscription needs and commitments, double payment, and especially the power of green mandates) come in. Surely the practical and professional mandate of the newly minted job title "repository manager" is not just a matter of abstract principles but of concrete, practical reality. Stevan Harnad American Scientist Open Access Forum Thursday, July 8. 2010On Comparing Institutional Apples With Multi-Institutional Fruit: The Denominator Fallacy Again
Chris Armbruster [CA] wrote in the American Scientist Open Access Forum:
CA: "'Institution' is indeed not a very precise concept, but the repository ranking will not be improved if one were to spend much time trying to decide which repository is institutional and which is not"If there is any rationale for separately ranking and comparing -- as the Ranking Web of World Repositories (RWWR) does -- both the top 800 repositories and the top 800 institutional repositories (and there is indeed an important rationale for doing so), then that rationale is that the institutions are indeed institutional and not multi-institutional. The purpose is to rank their relative size (and hence their success in capturing their target content), and there is no point in comparing the size of the category "apple" with the size of the category "fruit." This is the "denominator fallacy." The pro's and con's of Chris Armbruster's advocacy of central (multi-institutional) repositories over institutional repositories have already been multiply discussed over the years in this Forum and elsewhere. The argument for institutional repositories is that (1) institutions are the providers of all of OA's target content, (2) they have a stake in managing their own output, and (most important of all) (3) they are in a position to mandate the deposit of their own output. The argument for multi-institutional (central) repositories is that they look (superficially) as if they were bigger, hence more "successful" in attracting OA's target content. (Hence Chris's preference for keeping the two kinds of repositories and their sizes conflated in the RWWR rankings.) They also look (superficially) more manageable and sustainable. The argument against multi-institutional (central) repositories is (a) that multi-institutional entities (notably, funders) cannot mandate the deposit of all institutional research output (because not all research is funded), (b) that central deposit mandates compete with instead of reinforcing institutional mandates (eliciting resistance from authors facing the prospect of having to do double-deposits), and (most relevantly here) (c) that the size and success of a repository can only be evaluated and compared in relation to the size of that repository's total target output: And although there are differences among institutions in the size of their own total output (which can and should be weighted to normalize it and make it comparable), the differences in size between institutions and multi-institutions is the difference in size between the number of apples and the number of fruit. (The denominator fallacy.) Multi-institutional (central) repositories' content would have to be weighted by the output of all their actual and potential target institutions and the total target content of each, in order to make multi-institutional rankings comparable to those of individual institutions. RWWR is not doing that kind of weighting -- nor would it be easy to determine those weightings for each kind of multi-institutional repository, though it may eventually be possible to estimate in principle. If it were done, however, there would hardly be any need for two rankings (for repositories vs. institutional repositories). What would be clear from a proper denominator-weighted ranking of institutional and multi-institutional repositories is that, contrary to what Chris has argued, it is not at all true that the multi-institutional repositories are bigger or more successful in collecting their respective total target contents. Rather, it makes much more sense for both institutions and funders to mandate that researchers deposit in their own institutional repository -- from which multi-institutional collections could then be automatically harvested. (It would then be redundant to try to compare their relative success, as one would clearly be a derivative of the other.) For management and sustainability, local institutional deposit and central harvesting is the complementary -- and optimal -- solution. But first the primary content-provision problem has to be solved, otherwise there is next to nothing to manage and sustain! CA: "how about also deleting No 10 because it is only a departmental repository?"A departmental repository, in contrast, is sub-institutional rather than multi-institutional. Hence, unless there is to be a separate RWWR ranking of the top 800 departmental mandates, there is no harm in listing the departmental repositories among the institutional repositories -- except if the university has both an institutional and a departmental repository, and the contents of the departmental repository are also a proper subset of the contents of the institutional repository, hence double-counted. This is not the case in the instance of ["institutional"] repository #10, University of Southampton School of Electronics and Computer Science, whose contents are not part of institutional repository #27, University of Southampton. Rather than resulting in an inflated ranking for Southampton, this actually results in a lower ranking. The joint RWWR ranking of the integrated institutional repository would be higher for Southampton. (That said, with a properly weighted denominator, separately tagged departmental repositories would be useful at this time, to compare the relative success of institution-wide mandates vs. departmental/school/faculty mandates -- i.e., Arthur's Sale's "patchwork mandate" strategy.) CA: "Also, it is a bad idea to define repositories as institutional only if they restrict themselves to the output of a single institution. We already have too many repository managers who succumb to this kind of institutionalist logic - and reject OA content only because it is not from their own institution."If only the problem were that of an overflowing cup, with so much OA target content that it needs to be rejected! Chris has the OA content problem completely upside-down! The problem is that not enough of each institution's own OA target content is being deposited, anywhere -- not that institutions are declining to host the output of other institutions. (It is only Chris's central-repository preoccupation that makes him imagine that the latter is the problem.) What's missing is not repositories to deposit in, but mandates to deposit. The solution is for institutions and funders to mandate institutional deposit of all content, funded and unfunded, across all disciplines -- and then, if desired, to harvest that content into various central collections, by discipline, funder, language or nation, as desired. Institutions are the universal providers of all that content; they are also the natural locus for deposit mandates. CA: "The CSIC has a sound methodology for ranking repositories, and it not their job to define exclusively what is an IR and what not. And in cyberspace it is much more interesting to compare repositories according to domains and services they offer…"I take it that by the CSIC Chris means the RWWR. And as far as I can tell, the only reason Chris finds the methodology sound is that it conflates institutional and multi-institutional repositories, which favors Chris's preference for multi-institutional repositories. What is much more interesting and important in cyberspace than the locus of the distributed content is the presence of the content. Most (80%) of OA's target content is still missing from anywhere on the (free) web, and long overdue. Locus matters strategically for the concrete, practical goal of capturing that target content (and making it OA). Chris keeps systematically missing this point. If the content were all there already, none of this would matter in the slightest. (And a good intuition pump to bear in mind is that the key to the success of Google and the like was not to try to get everyone to deposit their content directly in Google: What happened, and worked, was distributed, local deposit and hosting, followed by central harvesting. Not a bad principle to generalize to OA...) CA: "Moreover, it would help if we could move beyond the often narrow understanding of what an institutional repository is and what not & acknowledge more clearly that a strategy of privileging institutional repositories as such has not helped."Chris does not seem to have noticed the growing institutional/departmental repository mandate movement (initiated in 2002 by Southampton ECS, but greatly accelerated since the 16th mandate in 2008 by Harvard FAS, and now running well over 100 institutional/departmental mandates, including UCL, MIT and Stanford, as well as over 40 funder mandates). It is not (and never has been) a matter of merely "privileging" institutional deposit, but mandating it. CA: "The value & sustainability of IRs (individually, as isolated instances, & if not embedded in a national system) is rather limited for both scholarship and open access."(1) Repository value is nil without content. (2) With content, locus is irrelevant, as search is not local but global, via central harvesters. (3) Sustainability is a red herring (especially with today's sparse OA content); institutional deposit loci and central harvesters are complementary, insofar as preservation is concerned. (4) Nations can and should mandate OA deposit. Nations can and should harvest OA deposits centrally. But there is no earthly need (or prospect) of nations directly hosting all their institutional OA output centrally, any more than there is any earthly need for nations to host all their institutions centrally. (5) If Chris is worried about limitations on OA scholarship, he should set his mind to thinking of how to induce the OA target content providers (institutional researchers) to deposit their content, to make it OA. (6) IRs will take care of themselves. CA: "Hence, it is very welcome that more determined efforts are underway at building viable networks of research repositories and integrate IRs in national systems (e.g. Ireland as latest instance)."All true, but a non sequitur, insofar was the fundamental problem of filling those repositories with their target contents is concerned. CA: "For a sustained argument, please see": Armbruster & Romary (2010) Comparing Repository Types: Challenges and Barriers for Subject-Based Repositories, Research Repositories, National Repository Systems and Institutional Repositories in Serving Scholarly Communication (accepted for publication in IJDLS)For a sustained critique and response, see: I have quickly skimmed (but not read verbatim) the new A & R paper, and I see that all of my prior objections (to A & R's earlier paper) remain unanswered, indeed not even noted.Conflating OA Repository-Content, Deposit-Locus, and Central-Service Issues (1) The 4-way classification system -- subject, nation, "research" and institution -- continues to be arbitrary and rather incoherent. (2) The three far more important and salient distinctions -- direct deposit repositories vs harvested collections, OA target content vs other kinds of content, mandated repositories vs. unmandated repositories -- are not treated (or not treated in enough depth to understand their salience) (3) The all-important question of how best to capture OA's target content -- the most central question, before we even talk about repository types, services or sustainability -- is not given any serious consideration. (4) The very specific question of locus of deposit, and its specific importance for deposit mandates (and hence for capturing the target content) is likewise not given any serious consideration. (5) The "denominator fallacy" continues to pervade throughout, in the continued reference to absolute repository size, without taking into account the size or proportion of the repository's target contents that the repository is actually capturing. (For an institutional repository, the denominator is its total refereed journal article output; for HAL -- which A & R stunningly misclassify as the most successful of all repositories! -- it is the totality of France's refereed journal article output.) In short, A & R's approach -- which takes so much of the current sparse and inchoate landscape for granted, and follows after it, instead of facing the real problem, which is to remedy that sparseness, and lead the way toward capturing the vast proportion of OA's target content (at least 80% of it) that is still not being captured (by any repository) -- is not, I believe, a realistic or productive one. The reality is that most repositories -- of all the kinds A & R consider and don't consider -- are near-empty of their target content. Consequently, search, services and sustainability are not the problem: Content is. Mandates generate the content, but A & R's treatment imagines that mandates, and their promise, amount mostly to funder mandates (and funder -- i.e. "research" repositories). This is (in my view) an enormous error: Not all scholarly and scientific research (perhaps not even the majority of it) is funded, but virtually all of it comes from institutions -- universities and research institutes. In and of itself, that is strong reason to give institutional repositories and institutional mandates far more serious thought than A & R give them. Another reason is that once institutional deposit is mandated and OA contents are being systematically deposited in their institutional repositories, they can be harvested to any other collections we may desire -- subject-based, national, "research" or what-have-you. Nor are the various search and other services that are built atop this OA content meant to be provided at the institutional level (where A & R note their absence as if it were a defect): services are a harvester-level function, whereas content-provision is an institution-level function. A & R's article is also missing the point of depositing the author's rather than the publisher's version (the author's version has far fewer restrictions and can be provided much earlier); nor does it take into account the power of institutional repositories to provide immediate "Almost OA" even in the case of publisher-embargoed content, via the semi-automatic "eprint request" button. A & R also make some incorrect assumptions about the difficult and effort of deposit and the need for library assistance and proxy deposit. Stevan Harnad American Scientist Open Access Forum Thursday, February 18. 2010Open Access Mandates and the "Fair Dealing" Button
Sale, A., Couture, M., Rodrigues, E., Carr, L. and Harnad, S. (2010) Open Access Mandates and the "Fair Dealing" Button. In: Dynamic Fair Dealing: Creating Canadian Culture Online (Rosemary J. Coombe & Darren Wershler, Eds.)
ABSTRACT: We describe the "Fair Dealing Button," a feature designed for authors who have deposited their papers in an Open Access Institutional Repository but have deposited them as "Closed Access" (meaning only the metadata are visible and retrievable, not the full eprint) rather than Open Access. The Button allows individual users to request and authors to provide a single eprint via semi-automated email. The purpose of the Button is to tide over research usage needs during any publisher embargo on Open Access and, more importantly, to make it possible for institutions to adopt the "Immediate-Deposit/Optional-Access" Mandate, without exceptions or opt-outs, instead of a mandate that allows delayed deposit or deposit waivers, depending on publisher permissions or embargoes (or no mandate at all). This is only "Almost-Open Access," but in facilitating exception-free immediate-deposit mandates it will accelerate the advent of universal Open Access. Sunday, January 31. 2010Annual Costs Per Deposit of Hosting Refereed Research Output Centrally Versus InstitutionallySANDY THATCHER: "it's the peer review that is the most expensive part of the whole process, and arXiv is not in the business of peer reviewing."What Sandy Thatcher said is perfectly correct:DAVID PROSSER:: "Is that true, Sandy? Can we have a reference please? Tenopir and King back in 2004 suggested that 'manuscript receipt processing, disposition decision-making, identifying reviewers or referees and review processing' constituted 26% of the direct costs of producing an article (which they estimated at $1700 on average). Of course, costs may have shifted in the years since then. Which is why a reference would be welcome." (1) The cost of providing peer review (c. $500 per article -- though more efficient online procedures could lower that) is indeed the most expensive part of the process of providing a peer-reviewed article for free (OA) by depositing it in a central repository like Arxiv (or in the author's own Institutional Repository, IR). (2) And Arxiv does not provide the peer review. (Nor does any other repository.) (3) Low as it is, $7 per article just for deposit and archiving is probably an overestimate, because Arxiv needs to do far too much work to process and store all the world's institutions' physics deposits centrally: It would cost even less per article for an Institutional Repository (IR) that archives only its own annual research output (and knows all its own researchers, hence need not do the extra generic precautionary controls). (Be careful not to jig the estimate by factoring in the costs of online infrastructure that the institution already has, regardless of whether it has an IR: just the one-time IR set-up cost, the extra server and disk-space, etc., plus the cost per deposit and annual maintenance of the IR only.) It would be useful to have IRs' estimates of their annual cost per article deposited -- but only from mature mandated IRs that are already well on the way to capturing 100% of their annual institutional output of refereed journal articles. (Obviously the IR price per article will be somewhat higher for IRs that are still only capturing only 15% or less of their annual refereed research output, as most IRs today still are, because they have not yet mandated deposit.) Another useful comparison would be the cost -- in money and time -- of doing the unnecessary IR "quality controls" and preprocessing that many IRs think, superstitiously and superfluously, that they need to do. (In this case, estimates from all the immature, near-empty IRs are relevant too.) At Southampton ECS, the first mandated IR of all (since 2002), we realized within the first year of the mandate that the "quality control" (for the content and metadata of the deposit) was based on a completely unnecessary and dysfunctional misanalogy with library collections and cataloguing, that all it did was create needless work and backlogs for the "quality-controllers" and needless resistance and counterproductive resentment from depositing authors who, having taken the trouble to deposit their refereed final drafts, as mandated, were then denied the immediate satisfaction of seeing their deposits go immediately online and start getting downloaded: instead, they had to go into a quality-control queue, sometimes for days or weeks, as the volume of mandated deposits to "process" grew. We quickly jettisoned the gratuitous process and have seen the IR's deposits growing happily ever since. Leave any "quality control" for your institutional authors' peer-reviewed final drafts in the background. If something is wrong, users will let the author know; if users don't squawk (or there are no users!), the slip-up probably isn't even worth correcting. Focus on solving the real problem, which is not "quality control" but capturing the IR's target content: the institution's full annual output of refereed research. And remember that -- whilst journals still exist and subscriptions are still paying for their quality control -- your IR is not hosting the all-important version-of-record, but merely an OA supplement. A word to the wise... Stevan Harnad American Scientist Open Access Forum Saturday, January 30. 2010Arxiv Arcana
Nat Gustafson-Sundell wrote:
NGS: "I don't expect local repositories to ever offer quality control."Of course not. They are merely offering a locus for authors to provide free access to their preprint drafts before submitting them to journals for peer review, and to their final drafts (postprints) after they have been peer-reviewed and accepted for publication by a journal. Individual institutions cannot peer-review their own research output (that would be in-house vanity-publishing). And global repositories like arxiv or pubmedcentral or citeseerx or google scholar cannot assume the peer-review functions of the thousands and thousands of journals that are actually doing the peer- review today. That would add billions to their costs (making each into one monstrous (generic?) megajournal: near impossible, practically, if it weren't also totally unnecessary -- and irrelevant to OA and its costs). NGS: "Also, users have said again and again that they prefer discovery by subject, which will be possible for semantic docs in local repositories or better indexes (probably built through better collaborations), but not now."Search should of course be central and subject-tagged, over a harvested central collection from the distributed local IRs, not local, IR by IR. (My point was that central deposit is no longer necessary nor desirable, either for content-provision or for search. The optimal system is institutional deposit (mandated by institutions as well as funders) and then central harvesting for search. NGS: "I agree that it would be great if local repositories were more used, and eventually, the systems will be in place to make it possible, but every study I've seen still shows local repository use to remain disappointingly low, although some universities are doing better than others.""Use" is ambiguous, as it can refer both to author use (for deposit) and user use (for search and retrieval). We agree that the latter makes no sense: users search at the harvester level, not the IR level. But for the former (low author "use," i.e., low levels of deposit), the solution is already known: Unmandated IRs (i.e., most of the existing c. 1500 IRs) are near empty (of OA's target content, which is preprints and postprints of peer-reviewed journal articles) whereas mandated IRs (c. 150, i.e.m 1%!) are capturing (or on the way to capturing) their full annual postprint output. So the solution is mandates. And the locus of deposit for both institutional and funder mandates should be institutional, not central, so the two kinds of mandates converge rather than compete (requiring multiple deposit of the same paper). For the special case of arxiv, with its long history of unmandated deposit, a university's IR could import its own remote arxiv deposits (or export its local deposits to arxiv) with software like SWORD, but eventually it is clear that institution-external deposit makes no sense: Institutions are the universal providers of all peer-reviewed research, funded and unfunded, across all fields. One-stop/one-step local deposit (followed by automatic import. export. and harvesting to/ from whatever central services are needed) is the only sensible, scaleable and sustainable system, and also the one that is most conducive to the growth of universal OA deposit mandates from institutions, reinforced by funder mandates likewise requiring institutional deposit, rather than discouraged by gratuitously requiring institution-external deposit. NGS: "Inter-institutional repositories by subject area (however broadly defined) simply work better, such as arXiv or even the Princeton-Stanford repository for working papers in the classics.""Work better" for what? Deposit or search? You are conflating the locus of search (which should, of course, be cross-institutional) with the locus of deposit, which should be institutional, in order to accelerate institutional deposit mandates and in order to prevent discouraging adoption and compliance because of the prospect of having to deposit the same paper in more than one place. (Yes, automatic import/export/harvesting software is indifferent to whether it is transferring from local IRs to central CRs or from central CRs to local IRs, but the logistics and pragmatics of deposit and deposit mandates -- since the institution is always the source of the content -- make it obvious that one-time deposit institutionally fits all output, systematically and tractably, whereas willy-nilly IR/CR deposit, depending on fields' prior deposit habits or funder preferences is a recipe for many more years of the confusion, inaction, absence of mandates, and near-absence of OA content that we have now.) NGS: "Currently, universities are paying external middlemen an outsized fee for validation and packaging services. These services can and should be brought "in-house" (at least as an ideal/ goal to develop toward whenever the opportunities can be seized) except in cases where prices align with value, which occurs still with some society and commercial publications."I completely agree that along with hosting their own peer-reviewed research output, and mandating its deposit in their own IRs, institutions can also use their IRs (along with specially developed software for this purpose) to showcase, manage, monitor, and measure their own research output. That is what OA metrics (local and global) will make possible. But not till the problem of getting the content into OA IRs is solved. And the solution is institutional and funder mandates -- for institutional (not institution-external) deposit. NGS: "To the extent that an arXiv or the inter-institutional repository for humanities research which will be showing up in 3-7 years moves toward offering these services, they are clearly preferable to old fashioned subscription models (since the financial support is for actual services) and current local repositories which do not offer everything needed in the value chain (as listed in Van de Sompel et al. 2004)."(1) The reason 99% of IRs offer no value is that 99% of IRs are at least 85% empty. Only the 1% that are mandated are providing the full institutional OA content -- funded and unfunded, across all disciplines -- that all this depends on. (2) The central collections, as noted, are indispensable for the services they provide, but that does not include locus of deposit and hosting: There, central deposit is counterproductive, a disservice. (3) With local hosting of all their research output, plus central harvesting services, institutions can get all they need by way of search and metrics, partly through local statistics, partly from central ones. NGS: " I remember when I first read an article quoting a researcher in an arXiv covered field who essentially said that journals in his field were just for vanity and advancement, since all the "action" was in arXiv (Ober et al. 2007 quoting Manuel 2001 quoting McGinty 1999) -- now think about the value of a repository that doesn't just store content and offer access."This familiar slogan, often voiced by longstanding arxiv users, that "Journals are obsolete: They're only for tenure committees. We [researchers] only use the arxiv" is as false, empirically, as it is incoherent, logically: It is just another instance of the "Simon Says" phenomenon: (Pay attention to what Simon actually does, not to what he says.) Although it is perfectly true that most arxiv users don't bother to consult journals any more -- using the OA version in arxiv only, and referring to the journal's canonical version-of-record only in citing -- it is equally (and far more relevantly) true that they all continue to submit all those papers to peer-reviewed journals, and to revise them according to the feedback from the referees, until they are accepted and published. That is precisely the same thing that all other researchers are doing, including the vast majority that do not self-archive their peer-reviewed postprints (or, even more rarely, their unrefereed preprints) at all. So journals are not just for vanity and advancement; they are for peer review. And arxiv users are just as dependent on that as all other researchers. (No one has ever done the experiment of trying to base all research usage on nothing but unrefereed preprints and spontaneous user feedback.) So the only thing that is true in what "Simon says" is that when all papers are available, OA, as peer-reviewed final drafts (and sometimes also supplemented earlier by the prerefereeing drafts) there is no longer any need for users or authors to consult the journal's proprietary version of record. (They can just cite it, sight unseen.) But what follows from that is that journals will eventually have to scale down to becoming just peer-review service-providers and certifiers (rather than continuing also to be access-providers or document producers, either on-paper or online). Nothing follows from that about the value of repositories, except that they are useless if they do not contain the target content (at least after peer review, and, where possible and desired by authors, also before peer review). Harnad, S. (1998/2000/2004) The invisible hand of peer review. Nature [online] (5 Nov. 1998), Exploit Interactive 5 (2000): and in Shatz, B. (2004) (ed.) Peer Review: A Critical Inquiry. Rowland & Littlefield. Pp. 235-242. NGS: "Do I think the financial backing will remain in place? It depends on the services actually offered and to what extent subject repositories could replace a patchwork system of single titles offered by a patchwork of publishers."At the moment the issue is whether arxiv, such as it is (a central locus for institution-external deposit of institutional research content in some fields, mostly physics, plus a search and alerting service), can be sustained by voluntary sub-sidy/scription -- not whether, if arxiv also somehow "took over" the function of journals (peer review), that too could be paid for by voluntary sub-sidy/ scription... NGS: "Universities could save a great deal by refusing to pay the same overhead over and over again to maintain complete collections in single subject areas (not to mention paying for other people's profits)."I can't quite follow this: You mean universities can cancel journal subscriptions? How do those universities' users then get access to those cancelled journals' contents, unless they are all being systematically made OA? Apart from those areas of physics where it has already been happening since 1991, that isn't going to happen in most other fields till OA is mandated by the universal providers of that content, the universities (reinforced by mandates from their funders). Then (but only then) can universities cancel their journal subscriptions and use (part of) their windfall saving to pay (journals!) for the peer-review of their own research output, article by article (instead of buying in other universities' output, journal by journal). NGS: "More importantly, more could be done to make articles useful and discoverable in a collaborative environment, from metadata to preservation, so that the value chain is extended and improved (my sci-fi includes semantic docs, not just cataloged texts, and improved, or multi-stage, peer review, or peer review on top of a working papers repository)."All fine, and desirable -- but not until all the OA content is being provided, and (outside of physics), it isn't being provided -- except when mandated... So let's not build castles in Spain before we have their contents safely in hand. NGS: "I think there's been plenty of 'chatter' to indicate that the basic assumptions in conversations between universities are changing (see recent conference agendas), so that we can expect to see more and more practical plans to collaborate on metadata, preservation, and , yes, publications."I'll believe the "chatter" when it has been cashed into action (deposit mandates). Till then it's just distraction and time-wasting. NGS: "My head spins to think of the amount of money to be saved on the development of more shared platforms, although, the money will only be saved if other expenditures are slowly turned off."All this talk about money, while the target content -- which could be provided at no cost -- is still not being provided (or mandated)... NGS: "Sandy mentioned in another post that she [he] would hope for arXiv like support for university monographs..."Monographs (not even a clearcut case, like peer-reviewed articles, which are all, already, author give-aways, written only for usage and impact) are moot, while not even peer-reviewed articles are being deposited, or mandated... NGS: "Open access and NFP publications which do offer the full value chain have been proven to have much lower production costs per page than FP publishers and they do not suffer any impact disadvantages -- and these are still operated on a largely stand-alone basis, without the advantages that can be gained by sharing overhead."Cash castles in Spain again, while the free content is not yet being provided or mandated... NGS: "Maybe local repositories really are the way to go, since then each institution has more control over its own contribution, but the collaboration and the support will still need to occur to support discovery (implying metadata, both in production and development of standards and tools) and preservation."No, search and preservation are not the problem: content is. NGS: "I suppose another problem with local repositories, however, is that a consensus is far less likely to unite around local repositories as a practical option at this juncture -- the case can't just be made with words, you need the numbers and arXiv has them -- and while I am interested to see strong local repositories emerge, there is greater sense in supporting what can be achieved, since we need more steps in the right direction.""The numbers" say the following: Physicists have been depositing their preprints and postprints spontaneously (unmandated) in arxiv since 1991, but in the ensuing 20 years this commendable practice has not been taken up by other disciplines. The numbers, in other words, are static, and stagnant. The only cases in which they have grown are those where deposit was mandated (by institutions and funders). And for that, it no longer makes sense (indeed it goes contrary to sense) to deposit them institutional-externally, instead of mandating institutional deposit and then harvesting centrally. And the virtue of that is that it distributes the costs of managing deposits sustainably, by offloading them onto each institution, for its own output, instead of depending on voluntary institutional sub-sidy/scription for obsolete and unnecessary central deposit. (See also the "denominator fallacy," which arises when you compare the size of size of central repositories with the size of institutional repositories: The world's 25,000 peer-reviewed journals publish about 2.5 million articles annually, across all fields. A repository's success rate is the proportion of its annual target contents that are being deposited annually. For an institution, the denominator is its own total annual peer-reviewed journal article output across all fields. For a central repository, it is the total annual article output -- in the field(s) it covers -- from all the institutions in the world. Of course the central repository's numerator is greater than any single institutional repository's numerator. But its denominator is far greater still. Arxiv has famously been doing extremely well for certain areas of physics, unmandated, for two decades. But in other areas arxiv is not not doing so well, relative to the field's true denominator; and most other central repositories are likewise not doing well, In fact, it is pretty certain that -- apart from physics, with its 2-decade tradition of deposit, plus a few other fields such as economics (preprints) and computer science -- unmandated central repositories are doing exactly as badly unmandated institutional repositories are doing, namely, about 15%.) Stevan Harnad American Scientist Open Access Forum Simplify OA Deposit But Leave It In the Mandatee's Hands
Congratulations to MIT for this extremely helpful streamlining of the deposit process:
"MIT Libraries began to investigate how SWORD and SWAP could facilitate external contributions by publishers... Entering long and complex information about articles is avoided with the MIT Libraries’ customized submission interface. Only two pieces of metadata are required for already published papers: the name of the authorizing MIT author and a DOI or URL. If the paper is unpublished, four fields are requested."Although entering metadata is not really that complicated and time-consuming at all, we know it is difficult to persuade those who have never deposited a paper in an institutional repository of this fact. So reducing deposit to just entering a name and URL would be a huge step forward in facilitating mandate compliance -- and of course also in encouraging unmandated deposit. I hope we will implement this quickly for EPrints repositories too. I am, however, far less sanguine about the second -- publisher-deposit -- option, especially for mandated deposit: 'the use of SWORD and SWAP with the DSpace repository at MIT is part of a larger strategy to improve collaboration with publishers, facilitating a “push” of large amounts of content into a repository without necessitating a platform-specific solution. Ultimately this “publisher template” could be used with other repository platforms such as Fedora and EPrints. Richard Rodgers, Head of Software Development at MIT Libraries, says, “If we do this right there will be no code to share. SWORD and SWAP are already open and accessible. We have localized their use to accommodate MIT-specific metadata.”It might be alright to quietly provide a way for publishers to facilitate IR deposit, but it would be a huge strategic error to give them an active or essential hand in it. All the power of self-archiving (and of self-archiving mandates from institutions and funders) comes from the fact that it is the author and the author's institution (and funder) that does it, mandates it, and monitors compliance. Self-archiving -- its doing and its timing -- is all in the research community's own hands. Publisher deposit is not. The little extra content that publisher-deposit or publisher-facilitated deposit might add does not counterbalance the additional author confusion, deposit delay, diffusion of responsibility and difficulty in compliance-monitoring that it is likely to introduce into institutional mandates, as it has already done with those funder mandates that allow fundees to offload their mandate fulfillment obligations onto publishers. The problem is especially with specifying and monitoring the fulfillment conditions for deposit mandate compliance. (We always have to remember that publishers are neither employees nor fundees, and hence they are not the ones subject to the deposit mandates). (What kind of mandate is it if it says "You must deposit -- unless your publisher does it for you..." How is it even to be monitored whether and when the mandate has been complied with?) So if repositories implement some sort of back door for publisher-facilitated deposit, it is important to keep a low profile on it and to stress that on no account should it be stipulated or relied on as one of the ways to fulfill a deposit mandate: Complying with the mandate must be entirely the responsibility of the author, and the monitoring and verification of compliance must be based entirely on steps taken by the author, not steps the authors leave to a publisher to (possibly) take (sometime) on their behalf... Stevan Harnad American Scientist Open Access Forum
« previous page
(Page 3 of 9, totaling 85 entries)
» next page
|
QuicksearchSyndicate This BlogMaterials You Are Invited To Use To Promote OA Self-Archiving:
Videos:
The American Scientist Open Access Forum has been chronicling and often directing the course of progress in providing Open Access to Universities' Peer-Reviewed Research Articles since its inception in the US in 1998 by the American Scientist, published by the Sigma Xi Society. The Forum is largely for policy-makers at universities, research institutions and research funding agencies worldwide who are interested in institutional Open Acess Provision policy. (It is not a general discussion group for serials, pricing or publishing issues: it is specifically focussed on institutional Open Acess policy.)
You can sign on to the Forum here.
ArchivesCalendar
CategoriesBlog AdministrationStatisticsLast entry: 2018-09-14 13:27
1129 entries written
238 comments have been made
Top ReferrersSyndicate This Blog |