Saturday, December 12. 2009Conflating OA Repository-Content, Deposit-Locus, and Central-Service Issues
Chris Armbruster wrote in the American Scientist Open Access Forum:
CA: "I have some doubts that the juxtaposition of institutional versus central repository is helpful (any longer)"No longer helpful for what? It is not only helpful but essential if what one is interested in is filling repositories with the target content of the OA movement (refereed journal articles). For in order to fill repositories, you have to get their target contents deposited. And to get their target contents deposited you have to mandate deposit. And to mandate deposit you have to specify the locus of the deposit. And the only two locus options are institutional and central. And for the probability of achieving consensus and compliance with mandates it makes a huge difference where the mandates propose to require the author to deposit: institutionally or centrally, because that in turn determines whether the author will have to deposit once, in one place, institution-internally, or more than once, in more than one place, institution-externally. The prospect of having to do multiple deposit is a deterrent to the depositor. And the prospect of having to compete with institution-external deposit mandates is a deterrent to achieving consensus and compliance with institutional deposit mandates. So those for whom the distinction between institutional and central repositories is not "helpful" are perhaps those for whom it is immaterial or secondary whether repositories get filled or remain near-empty, because their primary concerns are instead at some other, more abstract or idealized level: CA: "that is why the proposition is to henceforth distinguish between four ideal types of repositories on an abstract level, so as to be able to examine each specific repository in more detail."Alas, while we are theorizing at an abstract level about ideal repository types, real, concrete repositories remain mostly empty, in no small part because of some funders' failing to adopt practical, realistic mandates on locus of deposit, mandates that converge rather than compete with institutional mandates. The abstract distinctions among the four "ideal types of repositories" (apart from three of them being of doubtful substance) have nothing to do with this crucial concrete distinction, three of the four being "subtypes" of central repository. To repeat: Only a portion of OA's target content is funded, but all of it originates from institutions. CA: "For example PMC was a subject-based repository, but it languished before it became a research repository (capturing publication outputs) due to a national mandate, which is compatible with also having a UK PMC and PMC Canada."The only thing that changed with PMC was that it went from being an empty repository to being a less-empty repository when full-text deposit was mandated for NIH-funded articles. That had nothing to do with its changing from being a "subject" repository to a "research" repository. Its target contents were always the same: biomedical research articles. The only difference is that the mandates increased somewhat the proportion of PMC's total target content that actually got deposited. But the cost of that welcome increase was also a greater opportunity lost and a bad example set -- because NIH (and now its emulators) insisted on direct deposit in a central repository (PMC, and now its emulators) instead of allowing -- indeed preferring -- institutional deposit, and then harvesting, importing or exporting (one or many) central collections and services therefrom. That would have facilitated institutional mandates for all the rest of OA's target content, not just research funded by NIH (and its emulators), by spurring institutions -- the universal providers of all research output -- to mandate institutional deposit for all the rest of their research output too, funded and unfunded. Not all funders copied NIH, however, and there is still hope that NIH will rethink its arbitrary and counterproductive locus-of-deposit policy, in the interest of all of OA's target content: "NIH Open to Closer Collaboration With Institutional Repositories" CA: "The point here is to examine (here: for the life sciences) past and (possible) future repository development and help stakeholders make informed decisions."Help which stake-holders make which decisions about what, and why? While repositories remain near empty -- and that includes PMC (or its emulators) whose target contents comprise all of US (or other nations' or funders') biomedical research -- the only substantive thing at stake is content; and the "stake-holders" are mostly institutions and their researchers, who also happen to be the providers of all that content, funded and unfunded, across all nations and funders. CA: "Another example: the Dutch system looks like a network of institutional repositories, but is now part of a national gateway (NARCIS)."But what does this example show? The only relevant question is: what proportion of their own annual research output are those Dutch institutional repositories actually capturing? The last time I asked Leo Waaijers, he admitted quite frankly that no one has checked. But unless there is something different about the air breathed in the Netherlands, all indications are that their institutional repositories, like repositories everywhere, are only capturing about 15% of their target output. That is the approximate deposit rate for spontaneous (unmandated) self-archiving, worldwide. Only deposit mandates can raise that deposit rate appreciably -- and so far the Netherlands has no OA mandates. It matters how you do the arithmetic. An institutional repository can calculate its annual deposit rate by dividing its annual full-text article deposits for that year by the institution's annual article publication total for that year. But for a central repository -- or for a "network of institutional repositories" -- you have to make sure to divide by their respective annual total target output. For the Netherlands, that's the total annual article output from all the institutions in the NARCIS network. And for PMC it's all of US biomedical research article output. Otherwise one gets carried away in one's idealized abstractions by the spurious fact that central repositories often have much more content, in absolute terms, than individual institutional repositories. But remedying this "denominator fallacy" by dividing annual deposit counts by their total annual target content count quickly puts things back into practical perspective. (And this is without even mentioning the question of time-of-deposit, which is almost as important as locus-of-deposit: Many of the central repositories -- e.g. PMC -- have access embargoes because funder mandates have allowed them (and have even left it in the hands of publishers rather than fundees to do the deposits, even though it is fundees, not their publishers, who are subject to funder mandates). Institutional repositories have a powerful solution for providing "Almost OA" to closed access deposits during any embargo period -- the "email eprint request" Button. This Button is naturally and easily implemented by the repository software at the local institutional level, but would be devilishly difficult -- though not impossible -- to implement at the central level (especially where there is proxy deposit by publishers) because it requires immediate email approval by the author of eprint requests from the would-be user, mediated automatically by the repository software.) [Leo Waaijers has since responded on jisc-repositories as follows: "Currently 25% of the Dutch national research output published in 2008 is available in Open Access... For the moment we have no mandates. The Netherlands Research Organisation NWO has announced one. Six or seven universities have a mandate for doctoral theses."] 25-30% is the level to which Arthur Sale showed that deposit rates can be laboriously raised if one provided incentives (of which the Dutch "Cream of Science" is an example), but only mandates can propel deposits toward 100%. CA: "Moreover, the major institutions in the network are research universities. Thus the question arises, if Dutch repository development could be improved if stakeholders used the notion of research repository and national repository system to consider their options (rather than thinking that the institutions must do the job)."What on earth does this "arising question" mean at this late stage of the game? We have researchers, the ones who do the research and write the articles. They are (mostly, 85%) not depositing until and unless it is mandated by their institutions and/or funders. This is now unchangingly true for decades. Now what -- in specific, concrete, practical terms -- is it that using "the notion of research repository and national repository system to consider their options (rather than thinking that the institutions must do the job)" is supposed to do to fill those empty repositories? Is there any evidence that theorists' abstract contemplations about ideal repository subtypes translate into concrete, practical action on the part of researchers 85% of whom consistently fail to deposit unmandated into any-which repository across the years? CA: "In two decades of immersion in digital worlds, we have witnessed the development of various repository solutions and accumulated a better understanding of what works and what doesn't. The main repository solutions may be distinguished as follows:"Before we go on: The only thing we have learned in two decades -- apart from the fact that computer scientists, physicists and economists deposit spontaneously, unmandated (two of them institutionally, one of them centrally) at far higher than the global baseline 15% rate -- is that the only thing that will raise the spontaneous deposit rate is deposit mandates (from institutions or funders). That lesson has nothing whatsoever to do with "various repository solutions" (central or institutional, abstract or concrete, real or ideal, actual or notional). CA: "Subject-based repositories (commercial and non-commercial, single and federated) usually have been set up by community members and are adopted by the wider community. Spontaneous self-archiving is prevalent as the repository is of intrinsic value to scholars."Spontaneous self-archiving is "prevalent" at the steadfast rate of about 15%, and that is the problem. The nature of the repository has absolutely nothing to do with this, one way or the other. It is a matter of "community" practice. And, as noted, the few scholarly "communities" that have adopted spontaneous self-archiving practices unmandated (computer scientists, physicists and economists) did so very early on in these two decades, continuing their pre-Web pratices, two of them institutionally and one of them centrally; and they did so mainly to share preprints of unrefereed drafts early in their research cycle. The value they found in that practice predated the Web and had absolutely nothing to do with repository type (since two communities did it institutionally and one did it centrally). (And if it's hard to get authors to make their final drafts of refereed, published articles publicly accessible unless the practice is mandated, it would be incomparably harder to get authors from the "communities" that have their own reasons for not wanting to make their unrefereed drafts public to do so, against their wills: their institutions and funders certainly cannot mandate it!) "Commercial" vs. "non-commercial" also sounds like a can of worms: In speaking of "repositories," are we mixing up the Free-Access (OA) ones with the Fee-Access ones? And those that contain full-texts with those that contain only metadata? And those that contain articles with those that contain other kinds of content? If so, we are not even talking about the same thing when we speak of repositories, for all I mean is OA repositories of the full-texts of refereed research journal articles. CA: "Much of the intrinsic value for authors comes from the opportunity to communicate ideas and results early in the form of working papers and preprints, from which a variety of benefits may result, such as being able to claim priority, testing the value of an idea or result, improving a publication prior to submission, gaining recognition and attention internationally and so on."We are comparing apples and oranges. OA's primary target is not and has never been unpublished, unrefereed drafts. Distinguish the self-archiving of OA's target content -- refereed articles -- from the self-archiving of unrefereed preprint drafts. The latter practice has been found very useful by some disciplines (computer science, physics, economics) for a long time -- indeed before the Web. But this practice has not caught on with other disciplines, for an equally long time, in all likelihood because most disciplines are not interested in making their unrefereed drafts public. (Some may find this practice unscholarly; others might find it potentially embarrassing professionally; in some disciplines it might even be dangerous to public health.) And the overall global self-archiving rate remains the baseline 15% unless self-archiving is mandated. CA: "As such, subject-based repositories are thematically well defined, and alert services and usage statistics are meaningful for community users"This not only conflates unrefereed draft-sharing with OA and repositories with services over repositories, but it also mixes up cause and effect. There is no central repository functionality that cannot just as well be provided over distributed or harvested repositories. And there is no repository that cannot succeed if it manages to capture its target content. Otherwise, the rest of the functional details are merely decorative, for empty repositories. And neither OA's nor OA mandates' target is unrefereed drafts (though they are of course welcome if the author wants to deposit them too). CA: "Research repositories are usually sponsored by research funding or performing organisations to capture results. This capturing typically requires a deposit mandate."It makes no difference whether one calls a repository of, say, biomedical research a "subject" repository or a "research" repository. That's just words. And both institutions and funders "sponsor" them. All that matters is whether or not deposit is mandated, because that is what determines whether the repository is full or near-empty. Armbruster & Romary are conflating "mandated repository" with "central research repository." All OA repositories are "research repositories" because all have the same target content: refereed research articles. And both central and institutional deposit can be mandated. Armbruster & Romary seem to keep missing the sole substantive point at issue, which is that institutions are the universal providers of all of OA's target content, funded and unfunded, across all research subjects and all nations -- and funder mandates requiring direct central deposit compete with and discourage institutional mandates for all the rest of OA's target content, by requiring (from already-sluggish authors) divergent, multiple institution-external deposit instead of convergent one-stop institution-internal deposit (which can then be imported, exported or harvested by central collections and services). CA: "Publications are results, including books, but data may also be considered a result worth capturing, leading to a collection with a variety of items."It's nice to get more ambitious in speculating about what one would ideally like to see deposited, but let us not lose sight of practical reality today: Authors (85%) are not even depositing their refereed research articles until it is mandated. These are articles that -- without a single exception -- authors want to be accessible to any would-be user, for they have already published them. In contrast, it is certainly not true that all, most or even many authors today want to make their unpublished research data (perhaps still being data-mined by them) or their published books (perhaps still earning royalty revenue, or hoping to) or their unrefereed drafts (perhaps embarrassing or even dangerous until validated by peer review) publicly accessible to all users today. Now, does it not make more sense to try to encourage authors to provide OA to content that they would already wish to see freely accessible to any would-be user today -- by mandating the practice -- rather than imagining (contrary to fact) that authors are already providing OA to content that many of them may not yet even wish to see freely accessible to any would-be user today? CA: "Because these items constitute a record of science, standards for deposit and preservation must be stringent."Stringent standards for deposit? When most authors are not even bothering to deposit at all? That seems an odd way to try to generate more deposits! Rather like raising the price of a product that no one is bothering to buy at current prices. (No, it's not raising the quality of the product either: Users are the ones who benefit from repository functionality; but it is authors that we are trying to induce to provide the content to which this user-functionality is applied.) And is the scientific record not already in our journals and libraries, on paper and online? And is peer review not a already stringent enough standard? Yes, peer-reviewed articles need to be preserved, but what has that to do with authors depositing it in an OA repository? and usually deposited in the form of a refereed final draft which is not the canonical "version of record," but merely a supplementary version, to provide OA for those would-be users who do not have subscription access to the journal in which the canonical version -- the one that really needs the preservation -- was published). This is the old canard, again -- conflating digital preservation with Open Access provision -- and perhaps also conflating unpublished preprints with published postprints. And as to record-keeping: Yes, both institutions and funders need to keep records -- indeed archives -- of the research output that they employ and fund researchers to produce. Again, the natural locus for that record is the institutional repository, which the institution can manage, monitor and show-case, and from which the funder can import, export or harvest its funded subset if it wishes. Direct institution-external deposit, willy-nilly, would be like institutions relying on their banks to do their record-keeping instead of themselves. CA: "The sponsor of the repository is likely to tie reporting functions to the deposit mandate, this being, for example, the reporting of grantees to the funder or the presentation of research results in an annual report."Yes, both grant fulfillment and annual research output recording and evaluation can and should be implemented through repository deposit mandates, by both funders and institutions. But the question remains: What should be the locus of deposit? and should there be one convergent locus of deposit, for a researcher and/or article, or multiple divergent ones? The obvious answer, again, is one-time, one-place institution-internal deposit, mandated by both institutions and funders, and the rest by institution-external import/export/harvesting therefrom. CA: "Research repositories are likely to contain high-quality output. This is because its content is peer-reviewed multiple times (e.g. grant application, journal submission, research evaluation) and the production of the results is well funded."This is extremely blurred and vague. Inasmuch as refereed journal articles report funded research, they have been both grant-reviewed and peer-reviewed, so that's double-counting. Accepted grant proposals are not part of OA's target content, and are just a book-keeping matter for institutions and funders. Research evaluation is done on the basis of research performance and impact, including refereed publications as their primary input. We are again double-counting if we dub as triply peer-reviewed content that is simply standardly peer-reviewed articles, deposited for research evaluation in a repository. This sounds mostly like massaging the obvious without stating the obvious: None of it happens if the content in question is not deposited. Deposit needs to be mandated, and the locus of the deposit needs to be institutional, not central, to avoid needlesly placing divergent multiple-deposit burdens on the (already sluggish) author. CA: "Users who are collaborators, competitors or instigating a new research project are most likely to find the collections of relevance"Yes indeed -- if they are deposited. And they will only be deposited if deposit is mandated. And mandates need to be convergent rather than competitive in order to reach consensus on adoption and compliance. And hence the sole stipulated locus of deposit needs to be institutional. The rest is all just a matter of harvesting and services over distributed institutional repositories. CA: "National repository systems require coordination - more for a federated system, less for a unified system. National systems are designed to capture scholarly output more generally and not just with a view to preserving a record of scholarship, but also to support, for example, teaching and learning in higher education. Indeed, only a national purpose will justify the national investment. Such systems are likely to display scholarly outputs in the national language, highlight the publications of prominent scholars and develop a system for recording dissertations. One could conceive of such a national system as part of a national research library that serves scholarly communication in the national language, is an international showcase of national output and supports public policy, e.g. higher education and public access to knowledge"You are talking about a harvesting service. No need for it to be a direct locus of deposit. Which brings us back to the sole real priority, which is concerted, convergent mandates from institutions and funders (and national governments) to deposit (once only) in institutional repositories, minimizing the burden on authors. CA: "Institutional repositories contain the various outputs of the institution."And all other repositories -- subject-based, funder-based, or national -- likewise contain "the various outputs of the institution," institutions being the sole universal providers of all research output. CA: "While research results are important among these outputs, so are works of qualification or teaching and learning materials. If the repository captures the whole output, it is both a library and a showcase. It is a library holding a collection, and it is a showcase because the online open access display and availability of the collection may serve to impress and connect, for example, with alumni of the institution or the colleagues of researchers."It is highly desirable for universities to make their courseware freely accessible online. But it is a different agenda from OA's. And it has an even lower deposit rate today than OA: MIT is the only institution that has a policy of making its courseware openly accessible. If people are not yet recycling their waste, what needs to be done is to mandate waste recycling, not to find other worthy things it would be a good idea to do, but that people are likewise not doing, such as giving up cigarettes -- or other worthy things that a (near-empty) waste-recycling depository could host, aside from its target contents, such as charity-donation booths. Besides, some courseware -- especially material prepared in the hope of writing a best-selling textbook -- is more like data, books, unrefereed preprints (and software, and music and movies): discretionary give-aways, depending on the author, rather than universal give-ways, written solely for uptake and impact, like refereed research articles. So let's not remain oblivious to the vast shortfall in OA's target content by blurring it with fantasies about other kinds of content (much of it absent too!). As for theses: The natural solution for them is to treat them the same way as journal articles: mandate deposit in the institutional repository (as more and more universities are now beginning to do). CA: "A repository may also be an instrument of the institution by supporting, for example, internal and external assessment as well as strategic planning."Yes, and this is yet another rationale for mandating deposit of OA's target content: refereed research publications. Australia and the UK are beginning to link their institutional repositories to submissions for research assessment nationally, and universities like Liège are doing so for internal performance assessment. CA: "Moreover, an institutional repository could have an important function in regional development. It allows firms, public bodies and civil society organisations to immediately understand what kind of expertise is available locally."Yes, all true. These are further rationales for institutions mandating institutional deposit -- and for funder mandates to reinforce institutional deposit mandates rather than compete with them. CA: "These four ideal types have been derived partly from the history of repositories, partly through logical reasoning. This includes an appreciation of the relevant literature on scholarly communication, open access and repositories, though the [paper] is not a literature review but an argument that moves back and forth between abstract ideal types and specific cases. Ideal types should not be misunderstood as a classification, in which each and every repository may be identified as belonging unambiguously to a category. Rather, the purpose of creating ideal types is to aid our understanding of repositories and provide a tool for analysing repository development."The "argument" does not seem to be grounded in a grasp either of what (OA) repositories are for, or of the practical problem of filling them. The distinctions among central repositories are largely arbitrary and spurious; they are more about services and functionality than about locus of deposit or repository type. The fundamental and sole substantive point is completely missed: Deposit needs to be mandated (by the universal providers of the target OA content -- institutions -- reinforced by funders) and the locus of deposit needs to be institutional. The rest is just counting abstract chickens before their concrete eggs are fertilized, let alone laid or hatched. CA: "Some publication repositories may be identified easily as resembling very much one ideal type rather than another. Some of the classic repositories conventionally identified as subject-based, such as arXiv and RePEc, exhibit few features of another type. Yet, one of the more interesting questions to ask is in how far other elements are present and what this means. ArXiv, for example, is also a research repository, with institutions sponsoring research in high-energy physics being important to its development and success. RePEc, by comparison, has a strong institutional component because the repository is a federated system that relies on input and service from a variety of departments and institutes."Arxiv is based on direct central deposit of preprints (and postprints) in physics; Repec amalgamates distributed institutional deposits of preprints in economics; Citeseer harvests distributed institutional deposits of preprints and postprints in computer science. There is nothing to be learned here except that the spontaneous preprint (and postprint) deposit practices in these three research subject communities have failed to generalise to other research subject communities and therefore postprint deposit mandates from institutions and funders are needed, with one convergent locus of deposit: the repositories of the universal providers of all research, funded and unfunded, across all subjects and nations: the world's universities and research institutes. CA: "To continue with another example, PubMed Central (PMC), at first glance, is a subject-based repository. Acquisition of content, however, only took off once it was declared a research repository capturing the output of publicly funded research (by the NIH). Notably, US Congress passed the deposit mandate, transforming PMC into a national repository. That a parallel, though integrated, repository should emerge in the UK (UK PMC) and Canada (PMC Canada) is thus not surprising. Utilisation of the ideal types outlined above would thus be fruitful in analysing the development of PMC and, presumably, be equally valuable in discussing the future potential of PMC, for example the possible creation of a Europe PMC."This just repeats the very same incorrect analysis made earlier: PMC is and always was a US central research subject repository for refereed biomedical research publications (so are its emulators, for their own "national" output). What changed was not that NIH rebaptized PMC by "declaring" it a "research repository." What changed was that NIH mandated deposit (after two years wasted in the hope that a mere "invitation" would do). The rest is just monkey-see, monkey-do. What those aping the US missed, however, was all the rest of OA's target content, funded and unfunded -- across all nations, subjects and institutions -- and how not only mandating deposit, but mandating convergent institutional deposit is essential in order to have universal OA to refereed research in all subjects, worldwide. (The various national PMCs are a joke, and will be quietly rebaptized as harvested archival national collections -- if those are desired at all -- once worldwide OA content picks up, as institutional deposit mandates become universal. The global search functionality will not be at the level of all these absurd and superfluous national PMC clones, but at the level of global harvesting/search services. Why would any user -- peer or public -- want to search the world's biomedical literature by country (or institution, for that matter) -- other than for parochial actuarial purposes?) CA: "National solutions are increasingly common (and principally may also be regional in form), but vary especially with regard to privileging either research outputs or the institutions. The French HAL system is powered by the CNRS, the most prestigious national research organisation, and thus is strong on making available research results."Strong on making them available if/when deposited, but no stronger than the default 15% on getting them deposited at all. (The denominator fallacy again...) CA: "In Japan, the National Institute of Informatics has supported the Digital Repository Federation, which covers eighty-seven institutions, with mainly librarians working to make the system operational."Unless librarians in Japan have executive privileges over authors' writings that librarians elsewhere in the world lack, they will not be able to raise the deposit rate without mandating deposit either... CA: "In Spain, an aggregator and search portal, Recolecta, sits atop a multitude of institutional repositories, with a large variety of items."A large variety of "items": But what percentage of Spanish annual refereed article output is being deposited? My guess is that -- apart from Spain's 4 institutional mandates and 1 funder mandate -- that percentage will be the usual baseline 15% (looking spuriously bigger because aggregated centrally across multiple institutions: the denominator fallacy yet again...). CA: "In Australia, institutional repositories are prominently tied to the national research assessment exercise, with due emphasis on peer reviewed publications."That's promising, because being required to submit for research assessment via institutional repositories is effectively a deposit mandate. Moreover, with 1 funder mandate and 5 institutional mandates -- including the world's first institution-wide mandate at QUT -- Australia is neck-and-neck, proportionately, with the UK, in the worldwide national OA sweepstakes: The UK has 13 funder mandates, 11 institutional mandates, and 3 departmental mandates, including the world's very first OA mandate (U Southampton School of Electronics and Computer Science); the UK too is moving toward linking deposit to the new national research assessment scheme. CA: "Any Internet 101 course will include plenty of examples where deposit, content and service are assembled within a single site (by one provider, company etc.) - the list is really very long, from ArXiv to Amazon, SSRN to Flickr, RePEc to Facebook and so on. Internet 101 theory will then elucidate why this is so an (e.g. network effects, economies of scale and so on). Creating thousands of little repositories was probably never a good idea..."Umm, I guess Internet 101 will also tell us that creating billions of little sites was never a good idea and we should all be depositing directly in Google... CA: "More here:"Let the reader be prepared for a rather confused and practically unproductive mashup of OA repository-content, deposit-locus, and central-service issues in the Armbruster & Romary paper. Yet the resolution is a simple one-liner: All research institutions and funders worldwide need to mandate institutional deposit, and then reap the harvest centrally, with search services, subject collections, national collections, language collections, and any other "ideal" on which hearts are set. (But don't let the function-tail wag the content-dog now, when it's only at 15% body weight and needs to settle down and eat.) Stevan Harnad American Scientist Open Access Forum Trackbacks
Trackback specific URI for this entry
No Trackbacks
|
QuicksearchSyndicate This BlogMaterials You Are Invited To Use To Promote OA Self-Archiving:
Videos:
The American Scientist Open Access Forum has been chronicling and often directing the course of progress in providing Open Access to Universities' Peer-Reviewed Research Articles since its inception in the US in 1998 by the American Scientist, published by the Sigma Xi Society. The Forum is largely for policy-makers at universities, research institutions and research funding agencies worldwide who are interested in institutional Open Acess Provision policy. (It is not a general discussion group for serials, pricing or publishing issues: it is specifically focussed on institutional Open Acess policy.)
You can sign on to the Forum here.
ArchivesCalendar
CategoriesBlog AdministrationStatisticsLast entry: 2018-09-14 13:27
1129 entries written
238 comments have been made
Top ReferrersSyndicate This Blog |