Institutional Repositories

Saturday, December 12. 2009

Conflating OA Repository-Content, Deposit-Locus, and Central-Service Issues

Chris Armbruster wrote in the American Scientist Open Access Forum:

CA: "I have some doubts that the juxtaposition of institutional versus central repository is helpful (any longer)"

No longer helpful for what?

It is not only helpful but essential if what one is interested in is filling repositories with the target content of the OA movement (refereed journal articles). For in order to fill repositories, you have to get their target contents deposited. And to get their target contents deposited you have to mandate deposit. And to mandate deposit you have to specify the locus of the deposit. And the only two locus options are institutional and central. And for the probability of achieving consensus and compliance with mandates it makes a huge difference where the mandates propose to require the author to deposit: institutionally or centrally, because that in turn determines whether the author will have to deposit once, in one place, institution-internally, or more than once, in more than one place, institution-externally. The prospect of having to do multiple deposit is a deterrent to the depositor. And the prospect of having to compete with institution-external deposit mandates is a deterrent to achieving consensus and compliance with institutional deposit mandates.

So those for whom the distinction between institutional and central repositories is not "helpful" are perhaps those for whom it is immaterial or secondary whether repositories get filled or remain near-empty, because their primary concerns are instead at some other, more abstract or idealized level:

CA: "that is why the proposition is to henceforth distinguish between four ideal types of repositories on an abstract level, so as to be able to examine each specific repository in more detail."

Alas, while we are theorizing at an abstract level about ideal repository types, real, concrete repositories remain mostly empty, in no small part because of some funders' failing to adopt practical, realistic mandates on locus of deposit, mandates that converge rather than compete with institutional mandates.

The abstract distinctions among the four "ideal types of repositories" (apart from three of them being of doubtful substance) have nothing to do with this crucial concrete distinction, three of the four being "subtypes" of central repository.

To repeat: Only a portion of OA's target content is funded, but all of it originates from institutions.

CA: "For example PMC was a subject-based repository, but it languished before it became a research repository (capturing publication outputs) due to a national mandate, which is compatible with also having a UK PMC and PMC Canada."

The only thing that changed with PMC was that it went from being an empty repository to being a less-empty repository when full-text deposit was mandated for NIH-funded articles.

That had nothing to do with its changing from being a "subject" repository to a "research" repository. Its target contents were always the same: biomedical research articles. The only difference is that the mandates increased somewhat the proportion of PMC's total target content that actually got deposited.

But the cost of that welcome increase was also a greater opportunity lost and a bad example set -- because NIH (and now its emulators) insisted on direct deposit in a central repository (PMC, and now its emulators) instead of allowing -- indeed preferring -- institutional deposit, and then harvesting, importing or exporting (one or many) central collections and services therefrom.

That would have facilitated institutional mandates for all the rest of OA's target content, not just research funded by NIH (and its emulators), by spurring institutions -- the universal providers of all research output -- to mandate institutional deposit for all the rest of their research output too, funded and unfunded.

Not all funders copied NIH, however, and there is still hope that NIH will rethink its arbitrary and counterproductive locus-of-deposit policy, in the interest of all of OA's target content: "NIH Open to Closer Collaboration With Institutional Repositories"

CA: "The point here is to examine (here: for the life sciences) past and (possible) future repository development and help stakeholders make informed decisions."

Help which stake-holders make which decisions about what, and why?

While repositories remain near empty -- and that includes PMC (or its emulators) whose target contents comprise all of US (or other nations' or funders') biomedical research -- the only substantive thing at stake is content; and the "stake-holders" are mostly institutions and their researchers, who also happen to be the providers of all that content, funded and unfunded, across all nations and funders.

CA: "Another example: the Dutch system looks like a network of institutional repositories, but is now part of a national gateway (NARCIS)."

But what does this example show? The only relevant question is: what proportion of their own annual research output are those Dutch institutional repositories actually capturing?

The last time I asked Leo Waaijers, he admitted quite frankly that no one has checked. But unless there is something different about the air breathed in the Netherlands, all indications are that their institutional repositories, like repositories everywhere, are only capturing about 15% of their target output. That is the approximate deposit rate for spontaneous (unmandated) self-archiving, worldwide. Only deposit mandates can raise that deposit rate appreciably -- and so far the Netherlands has no OA mandates.

It matters how you do the arithmetic. An institutional repository can calculate its annual deposit rate by dividing its annual full-text article deposits for that year by the institution's annual article publication total for that year.

But for a central repository -- or for a "network of institutional repositories" -- you have to make sure to divide by their respective annual total target output. For the Netherlands, that's the total annual article output from all the institutions in the NARCIS network. And for PMC it's all of US biomedical research article output.

Otherwise one gets carried away in one's idealized abstractions by the spurious fact that central repositories often have much more content, in absolute terms, than individual institutional repositories. But remedying this "denominator fallacy" by dividing annual deposit counts by their total annual target content count quickly puts things back into practical perspective.

(And this is without even mentioning the question of time-of-deposit, which is almost as important as locus-of-deposit: Many of the central repositories -- e.g. PMC -- have access embargoes because funder mandates have allowed them (and have even left it in the hands of publishers rather than fundees to do the deposits, even though it is fundees, not their publishers, who are subject to funder mandates). Institutional repositories have a powerful solution for providing "Almost OA" to closed access deposits during any embargo period -- the "email eprint request" Button. This Button is naturally and easily implemented by the repository software at the local institutional level, but would be devilishly difficult -- though not impossible -- to implement at the central level (especially where there is proxy deposit by publishers) because it requires immediate email approval by the author of eprint requests from the would-be user, mediated automatically by the repository software.)

[Leo Waaijers has since responded on jisc-repositories as follows: "Currently 25% of the Dutch national research output published in 2008 is available in Open Access... For the moment we have no mandates. The Netherlands Research Organisation NWO has announced one. Six or seven universities have a mandate for doctoral theses."]

25-30% is the level to which Arthur Sale showed that deposit rates can be laboriously raised if one provided incentives (of which the Dutch "Cream of Science" is an example), but only mandates can propel deposits toward 100%.

CA: "Moreover, the major institutions in the network are research universities. Thus the question arises, if Dutch repository development could be improved if stakeholders used the notion of research repository and national repository system to consider their options (rather than thinking that the institutions must do the job)."

What on earth does this "arising question" mean at this late stage of the game? We have researchers, the ones who do the research and write the articles. They are (mostly, 85%) not depositing until and unless it is mandated by their institutions and/or funders. This is now unchangingly true for decades.

Now what -- in specific, concrete, practical terms -- is it that using "the notion of research repository and national repository system to consider their options (rather than thinking that the institutions must do the job)" is supposed to do to fill those empty repositories? Is there any evidence that theorists' abstract contemplations about ideal repository subtypes translate into concrete, practical action on the part of researchers 85% of whom consistently fail to deposit unmandated into any-which repository across the years?

CA: "In two decades of immersion in digital worlds, we have witnessed the development of various repository solutions and accumulated a better understanding of what works and what doesn't. The main repository solutions may be distinguished as follows:"

Before we go on: The only thing we have learned in two decades -- apart from the fact that computer scientists, physicists and economists deposit spontaneously, unmandated (two of them institutionally, one of them centrally) at far higher than the global baseline 15% rate -- is that the only thing that will raise the spontaneous deposit rate is deposit mandates (from institutions or funders).

That lesson has nothing whatsoever to do with "various repository solutions" (central or institutional, abstract or concrete, real or ideal, actual or notional).

CA: "Subject-based repositories (commercial and non-commercial, single and federated) usually have been set up by community members and are adopted by the wider community. Spontaneous self-archiving is prevalent as the repository is of intrinsic value to scholars."

Spontaneous self-archiving is "prevalent" at the steadfast rate of about 15%, and that is the problem.

The nature of the repository has absolutely nothing to do with this, one way or the other. It is a matter of "community" practice.

And, as noted, the few scholarly "communities" that have adopted spontaneous self-archiving practices unmandated (computer scientists, physicists and economists) did so very early on in these two decades, continuing their pre-Web pratices, two of them institutionally and one of them centrally; and they did so mainly to share preprints of unrefereed drafts early in their research cycle. The value they found in that practice predated the Web and had absolutely nothing to do with repository type (since two communities did it institutionally and one did it centrally).

(And if it's hard to get authors to make their final drafts of refereed, published articles publicly accessible unless the practice is mandated, it would be incomparably harder to get authors from the "communities" that have their own reasons for not wanting to make their unrefereed drafts public to do so, against their wills: their institutions and funders certainly cannot mandate it!)

"Commercial" vs. "non-commercial" also sounds like a can of worms: In speaking of "repositories," are we mixing up the Free-Access (OA) ones with the Fee-Access ones? And those that contain full-texts with those that contain only metadata? And those that contain articles with those that contain other kinds of content? If so, we are not even talking about the same thing when we speak of repositories, for all I mean is OA repositories of the full-texts of refereed research journal articles.

CA: "Much of the intrinsic value for authors comes from the opportunity to communicate ideas and results early in the form of working papers and preprints, from which a variety of benefits may result, such as being able to claim priority, testing the value of an idea or result, improving a publication prior to submission, gaining recognition and attention internationally and so on."

We are comparing apples and oranges. OA's primary target is not and has never been unpublished, unrefereed drafts.

Distinguish the self-archiving of OA's target content -- refereed articles -- from the self-archiving of unrefereed preprint drafts. The latter practice has been found very useful by some disciplines (computer science, physics, economics) for a long time -- indeed before the Web. But this practice has not caught on with other disciplines, for an equally long time, in all likelihood because most disciplines are not interested in making their unrefereed drafts public. (Some may find this practice unscholarly; others might find it potentially embarrassing professionally; in some disciplines it might even be dangerous to public health.)

And the overall global self-archiving rate remains the baseline 15% unless self-archiving is mandated.

CA: "As such, subject-based repositories are thematically well defined, and alert services and usage statistics are meaningful for community users"

This not only conflates unrefereed draft-sharing with OA and repositories with services over repositories, but it also mixes up cause and effect. There is no central repository functionality that cannot just as well be provided over distributed or harvested repositories. And there is no repository that cannot succeed if it manages to capture its target content. Otherwise, the rest of the functional details are merely decorative, for empty repositories.

And neither OA's nor OA mandates' target is unrefereed drafts (though they are of course welcome if the author wants to deposit them too).

CA: "Research repositories are usually sponsored by research funding or performing organisations to capture results. This capturing typically requires a deposit mandate."

It makes no difference whether one calls a repository of, say, biomedical research a "subject" repository or a "research" repository. That's just words. And both institutions and funders "sponsor" them. All that matters is whether or not deposit is mandated, because that is what determines whether the repository is full or near-empty.

Armbruster & Romary are conflating "mandated repository" with "central research repository." All OA repositories are "research repositories" because all have the same target content: refereed research articles. And both central and institutional deposit can be mandated.

Armbruster & Romary seem to keep missing the sole substantive point at issue, which is that institutions are the universal providers of all of OA's target content, funded and unfunded, across all research subjects and all nations -- and funder mandates requiring direct central deposit compete with and discourage institutional mandates for all the rest of OA's target content, by requiring (from already-sluggish authors) divergent, multiple institution-external deposit instead of convergent one-stop institution-internal deposit (which can then be imported, exported or harvested by central collections and services).

CA: "Publications are results, including books, but data may also be considered a result worth capturing, leading to a collection with a variety of items."

It's nice to get more ambitious in speculating about what one would ideally like to see deposited, but let us not lose sight of practical reality today: Authors (85%) are not even depositing their refereed research articles until it is mandated. These are articles that -- without a single exception -- authors want to be accessible to any would-be user, for they have already published them.

In contrast, it is certainly not true that all, most or even many authors today want to make their unpublished research data (perhaps still being data-mined by them) or their published books (perhaps still earning royalty revenue, or hoping to) or their unrefereed drafts (perhaps embarrassing or even dangerous until validated by peer review) publicly accessible to all users today.

Now, does it not make more sense to try to encourage authors to provide OA to content that they would already wish to see freely accessible to any would-be user today -- by mandating the practice -- rather than imagining (contrary to fact) that authors are already providing OA to content that many of them may not yet even wish to see freely accessible to any would-be user today?

CA: "Because these items constitute a record of science, standards for deposit and preservation must be stringent."

Stringent standards for deposit? When most authors are not even bothering to deposit at all? That seems an odd way to try to generate more deposits! Rather like raising the price of a product that no one is bothering to buy at current prices.

(No, it's not raising the quality of the product either: Users are the ones who benefit from repository functionality; but it is authors that we are trying to induce to provide the content to which this user-functionality is applied.)

And is the scientific record not already in our journals and libraries, on paper and online? And is peer review not a already stringent enough standard?

Yes, peer-reviewed articles need to be preserved, but what has that to do with authors depositing it in an OA repository? and usually deposited in the form of a refereed final draft which is not the canonical "version of record," but merely a supplementary version, to provide OA for those would-be users who do not have subscription access to the journal in which the canonical version -- the one that really needs the preservation -- was published).

This is the old canard, again -- conflating digital preservation with Open Access provision -- and perhaps also conflating unpublished preprints with published postprints.

And as to record-keeping: Yes, both institutions and funders need to keep records -- indeed archives -- of the research output that they employ and fund researchers to produce. Again, the natural locus for that record is the institutional repository, which the institution can manage, monitor and show-case, and from which the funder can import, export or harvest its funded subset if it wishes. Direct institution-external deposit, willy-nilly, would be like institutions relying on their banks to do their record-keeping instead of themselves.

CA: "The sponsor of the repository is likely to tie reporting functions to the deposit mandate, this being, for example, the reporting of grantees to the funder or the presentation of research results in an annual report."

Yes, both grant fulfillment and annual research output recording and evaluation can and should be implemented through repository deposit mandates, by both funders and institutions. But the question remains: What should be the locus of deposit? and should there be one convergent locus of deposit, for a researcher and/or article, or multiple divergent ones?

The obvious answer, again, is one-time, one-place institution-internal deposit, mandated by both institutions and funders, and the rest by institution-external import/export/harvesting therefrom.

CA: "Research repositories are likely to contain high-quality output. This is because its content is peer-reviewed multiple times (e.g. grant application, journal submission, research evaluation) and the production of the results is well funded."

This is extremely blurred and vague.

Inasmuch as refereed journal articles report funded research, they have been both grant-reviewed and peer-reviewed, so that's double-counting.

Accepted grant proposals are not part of OA's target content, and are just a book-keeping matter for institutions and funders.

Research evaluation is done on the basis of research performance and impact, including refereed publications as their primary input. We are again double-counting if we dub as triply peer-reviewed content that is simply standardly peer-reviewed articles, deposited for research evaluation in a repository.

This sounds mostly like massaging the obvious without stating the obvious: None of it happens if the content in question is not deposited. Deposit needs to be mandated, and the locus of the deposit needs to be institutional, not central, to avoid needlesly placing divergent multiple-deposit burdens on the (already sluggish) author.

CA: "Users who are collaborators, competitors or instigating a new research project are most likely to find the collections of relevance"

Yes indeed -- if they are deposited. And they will only be deposited if deposit is mandated. And mandates need to be convergent rather than competitive in order to reach consensus on adoption and compliance. And hence the sole stipulated locus of deposit needs to be institutional. The rest is all just a matter of harvesting and services over distributed institutional repositories.

CA: "National repository systems require coordination - more for a federated system, less for a unified system. National systems are designed to capture scholarly output more generally and not just with a view to preserving a record of scholarship, but also to support, for example, teaching and learning in higher education. Indeed, only a national purpose will justify the national investment. Such systems are likely to display scholarly outputs in the national language, highlight the publications of prominent scholars and develop a system for recording dissertations. One could conceive of such a national system as part of a national research library that serves scholarly communication in the national language, is an international showcase of national output and supports public policy, e.g. higher education and public access to knowledge"

You are talking about a harvesting service. No need for it to be a direct locus of deposit.

Which brings us back to the sole real priority, which is concerted, convergent mandates from institutions and funders (and national governments) to deposit (once only) in institutional repositories, minimizing the burden on authors.

CA: "Institutional repositories contain the various outputs of the institution."

And all other repositories -- subject-based, funder-based, or national -- likewise contain "the various outputs of the institution," institutions being the sole universal providers of all research output.

CA: "While research results are important among these outputs, so are works of qualification or teaching and learning materials. If the repository captures the whole output, it is both a library and a showcase. It is a library holding a collection, and it is a showcase because the online open access display and availability of the collection may serve to impress and connect, for example, with alumni of the institution or the colleagues of researchers."

It is highly desirable for universities to make their courseware freely accessible online. But it is a different agenda from OA's. And it has an even lower deposit rate today than OA: MIT is the only institution that has a policy of making its courseware openly accessible.

If people are not yet recycling their waste, what needs to be done is to mandate waste recycling, not to find other worthy things it would be a good idea to do, but that people are likewise not doing, such as giving up cigarettes -- or other worthy things that a (near-empty) waste-recycling depository could host, aside from its target contents, such as charity-donation booths.

Besides, some courseware -- especially material prepared in the hope of writing a best-selling textbook -- is more like data, books, unrefereed preprints (and software, and music and movies): discretionary give-aways, depending on the author, rather than universal give-ways, written solely for uptake and impact, like refereed research articles.

So let's not remain oblivious to the vast shortfall in OA's target content by blurring it with fantasies about other kinds of content (much of it absent too!).

As for theses: The natural solution for them is to treat them the same way as journal articles: mandate deposit in the institutional repository (as more and more universities are now beginning to do).

CA: "A repository may also be an instrument of the institution by supporting, for example, internal and external assessment as well as strategic planning."

Yes, and this is yet another rationale for mandating deposit of OA's target content: refereed research publications. Australia and the UK are beginning to link their institutional repositories to submissions for research assessment nationally, and universities like Liège are doing so for internal performance assessment.

CA: "Moreover, an institutional repository could have an important function in regional development. It allows firms, public bodies and civil society organisations to immediately understand what kind of expertise is available locally."

Yes, all true. These are further rationales for institutions mandating institutional deposit -- and for funder mandates to reinforce institutional deposit mandates rather than compete with them.

CA: "These four ideal types have been derived partly from the history of repositories, partly through logical reasoning. This includes an appreciation of the relevant literature on scholarly communication, open access and repositories, though the [paper] is not a literature review but an argument that moves back and forth between abstract ideal types and specific cases. Ideal types should not be misunderstood as a classification, in which each and every repository may be identified as belonging unambiguously to a category. Rather, the purpose of creating ideal types is to aid our understanding of repositories and provide a tool for analysing repository development."

The "argument" does not seem to be grounded in a grasp either of what (OA) repositories are for, or of the practical problem of filling them. The distinctions among central repositories are largely arbitrary and spurious; they are more about services and functionality than about locus of deposit or repository type. The fundamental and sole substantive point is completely missed: Deposit needs to be mandated (by the universal providers of the target OA content -- institutions -- reinforced by funders) and the locus of deposit needs to be institutional.

The rest is just counting abstract chickens before their concrete eggs are fertilized, let alone laid or hatched.

CA: "Some publication repositories may be identified easily as resembling very much one ideal type rather than another. Some of the classic repositories conventionally identified as subject-based, such as arXiv and RePEc, exhibit few features of another type. Yet, one of the more interesting questions to ask is in how far other elements are present and what this means. ArXiv, for example, is also a research repository, with institutions sponsoring research in high-energy physics being important to its development and success. RePEc, by comparison, has a strong institutional component because the repository is a federated system that relies on input and service from a variety of departments and institutes."

Arxiv is based on direct central deposit of preprints (and postprints) in physics; Repec amalgamates distributed institutional deposits of preprints in economics; Citeseer harvests distributed institutional deposits of preprints and postprints in computer science. There is nothing to be learned here except that the spontaneous preprint (and postprint) deposit practices in these three research subject communities have failed to generalise to other research subject communities and therefore postprint deposit mandates from institutions and funders are needed, with one convergent locus of deposit: the repositories of the universal providers of all research, funded and unfunded, across all subjects and nations: the world's universities and research institutes.

CA: "To continue with another example, PubMed Central (PMC), at first glance, is a subject-based repository. Acquisition of content, however, only took off once it was declared a research repository capturing the output of publicly funded research (by the NIH). Notably, US Congress passed the deposit mandate, transforming PMC into a national repository. That a parallel, though integrated, repository should emerge in the UK (UK PMC) and Canada (PMC Canada) is thus not surprising. Utilisation of the ideal types outlined above would thus be fruitful in analysing the development of PMC and, presumably, be equally valuable in discussing the future potential of PMC, for example the possible creation of a Europe PMC."

This just repeats the very same incorrect analysis made earlier: PMC is and always was a US central research subject repository for refereed biomedical research publications (so are its emulators, for their own "national" output). What changed was not that NIH rebaptized PMC by "declaring" it a "research repository." What changed was that NIH mandated deposit (after two years wasted in the hope that a mere "invitation" would do).

The rest is just monkey-see, monkey-do. What those aping the US missed, however, was all the rest of OA's target content, funded and unfunded -- across all nations, subjects and institutions -- and how not only mandating deposit, but mandating convergent institutional deposit is essential in order to have universal OA to refereed research in all subjects, worldwide.

(The various national PMCs are a joke, and will be quietly rebaptized as harvested archival national collections -- if those are desired at all -- once worldwide OA content picks up, as institutional deposit mandates become universal. The global search functionality will not be at the level of all these absurd and superfluous national PMC clones, but at the level of global harvesting/search services. Why would any user -- peer or public -- want to search the world's biomedical literature by country (or institution, for that matter) -- other than for parochial actuarial purposes?)

CA: "National solutions are increasingly common (and principally may also be regional in form), but vary especially with regard to privileging either research outputs or the institutions. The French HAL system is powered by the CNRS, the most prestigious national research organisation, and thus is strong on making available research results."

Strong on making them available if/when deposited, but no stronger than the default 15% on getting them deposited at all. (The denominator fallacy again...)

CA: "In Japan, the National Institute of Informatics has supported the Digital Repository Federation, which covers eighty-seven institutions, with mainly librarians working to make the system operational."

Unless librarians in Japan have executive privileges over authors' writings that librarians elsewhere in the world lack, they will not be able to raise the deposit rate without mandating deposit either...

CA: "In Spain, an aggregator and search portal, Recolecta, sits atop a multitude of institutional repositories, with a large variety of items."

A large variety of "items": But what percentage of Spanish annual refereed article output is being deposited? My guess is that -- apart from Spain's 4 institutional mandates and 1 funder mandate -- that percentage will be the usual baseline 15% (looking spuriously bigger because aggregated centrally across multiple institutions: the denominator fallacy yet again...).

CA: "In Australia, institutional repositories are prominently tied to the national research assessment exercise, with due emphasis on peer reviewed publications."

That's promising, because being required to submit for research assessment via institutional repositories is effectively a deposit mandate. Moreover, with 1 funder mandate and 5 institutional mandates -- including the world's first institution-wide mandate at QUT -- Australia is neck-and-neck, proportionately, with the UK, in the worldwide national OA sweepstakes: The UK has 13 funder mandates, 11 institutional mandates, and 3 departmental mandates, including the world's very first OA mandate (U Southampton School of Electronics and Computer Science); the UK too is moving toward linking deposit to the new national research assessment scheme.

CA: "Any Internet 101 course will include plenty of examples where deposit, content and service are assembled within a single site (by one provider, company etc.) - the list is really very long, from ArXiv to Amazon, SSRN to Flickr, RePEc to Facebook and so on. Internet 101 theory will then elucidate why this is so an (e.g. network effects, economies of scale and so on). Creating thousands of little repositories was probably never a good idea..."

Umm, I guess Internet 101 will also tell us that creating billions of little sites was never a good idea and we should all be depositing directly in Google...

CA: "More here:"
Armbruster, Chris and Romary, Laurent, Comparing Repository Types: Challenges and Barriers for Subject-Based Repositories, Research Repositories, National Repository Systems and Institutional Repositories in Serving Scholarly Communication (November 23, 2009). Available at SSRN

Let the reader be prepared for a rather confused and practically unproductive mashup of OA repository-content, deposit-locus, and central-service issues in the Armbruster & Romary paper.

Yet the resolution is a simple one-liner: All research institutions and funders worldwide need to mandate institutional deposit, and then reap the harvest centrally, with search services, subject collections, national collections, language collections, and any other "ideal" on which hearts are set.

(But don't let the function-tail wag the content-dog now, when it's only at 15% body weight and needs to settle down and eat.)

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Institutional Repositories at 13:12 | Comments (0) | Trackbacks (0)

Where to Mandate Deposit: Proxy Deposit and the "Denominator Fallacy"

On Mon, Nov 30, 2009 [identity deleted] wrote:

"Re: PubMedCentral (PMC) [is] (relatively) well populated."

PMC is only better populated than any other repository to the degree that some funders have mandated deposit. But there is a "denominator fallacy" here:

Mandates only cover content mandated by the funders in question. As a percentage of the world's (or, if you like, the US's) total annual biological and medical research output, PMC still covers only a small fraction of that total annual target.

The global baseline for spontaneous (unmandated) self-archiving is about 15%. The degree to which PMC is doing better than that -- and it is, but the question is by how much? -- can only be properly determined if we divide the deposits of biomedical articles published in each year by the total published biomedical output for that year (worldwide, or by US authors, if you prefer).

If you do that calculation, you will almost certainly find that PMC is doing no better nor worse for its mandated content than any mandated institutional repository; and for its unmandated content it is likewise doing no better nor worse than any unmandated institutional repository (c. 15%).

We cannot go a single step further on the underlying question -- how to populate repositories, and how well various approaches and repositories are doing -- until the denominator fallacy is corrected.

"It's my understanding that not only is there a funder mandate, but the Wellcome Trust also publicly favour the author-pays model, through which the publisher will deposit the article to PubMedCentral on behalf of the author."

Yes, and that's bad news, and unfortunately that's a bad policy on the part of the Wellcome Trust (WT), ideologically rather than functionally driven, and ill thought-through, but one from which WT seems to be unwilling to consider budging. So the problem is that whereas WT are rightly to be admired for having been the first funder to mandate OA, they have also set a very bad example for other mandates.

(1) Note that squandering research money on funding Gold OA publication for WT-funded research does not increase by a single article the number of articles that are made OA as a result of the WT mandate. (They could all have been simply self-archived instead, setting an incomparably better, and more scaleable example for other funders, who have better things to do with their scarce research funds than to pay for Gold OA when they can mandate Green OA without having to redirect any money from research: Publication is already being paid for, in full, by institutional subscriptions) The prospect of having to pay for Gold just discourages potential emulators who could mandate Green, but cannot afford to pay for Gold.

(2) In addition, WT mandates central deposit instead of institutional, which is, as I've said, counterproductive for facilitating and reinforcing the really crucial mandates, which are the ones by the universal providers of all research: the world's universities and research institutes. Central deposit mandates compete with institutional deposit mandates, for no good reason whatsoever.

(3) And third, allowing a deposit mandate to be fulfilled by a publisher instead of by the fundee (hence the mandatee) is doubly foolish. It lets the publisher decide (and enforce) the embargo period (as well as allowing them to drag their feet, since they are beholden to no one in their timing or compliance); and it reinforces the paid Gold option (self-fulfillingly) by making it the only one likely to generate timely deposit (but at a price).

Three unnecessary, dysfunctional features, adopted completely arbitrarily, simply because WT did not think things through carefully (and would not -- and alas still will not -- listen to advice even today).

But now, because WT were the first mandators (and they still do deserve eternal blessing for that!), their somnambulistic "view" is taken to be oracular by others. So it does damage beyond the ambit of WT's own funding.

"The rights negotiation over what can be done with the full text deposit in that situation is clear and the Wellcome trust openly prefer this route to OA [over] the post-print deposit, although they do support both models if the post-print is deposited to PubMedCentral directly. They also are prepared to pay for such a model of author-pays and publisher deposits."

As I've just said, it's bad news that WT "prefers" the funded Gold OA route because it makes Green appear less adequate for providing OA (though it is in fact more adequate tha Gold), it wastes money, it discourages mandates from those who cannot follow WT's example with paying for Gold, and it is totally unnecessary:

There are no re-uses at all that the access-deprived potential users (for whom OA is intended) require that they do not have with Green OA: All that is needed is immediate, free online access to the full texts of the articles. There are no further uses or re-uses that need sanctioning or licensing. Researchers, students and teachers don't do "mashups" of journal articles, as teenagers do for music, video and text youtube. They just need to be able to access the texts online, download them to read and data-crunch and perhaps print off for themselves. That's it. The published text's content (as opposed to its verbatim text) was always free to be used, applied, built-upon (and cited), if only the user could manage to access the verbatim text. And that's what Green OA provides, 24/7, webwide: access. Teachers can put the URLs in their course-packs (no need to worry about supplying multiple hard copies, or any associated permissions issues: just URLs are enough if the texts themselves are OA). No need to "re-publish" either. So what are these re-use rights that all that extra Gold money is needed to pay for?

And as to timing: WT allows a one-year embargo. If they mandated institutional deposit, the institutional repositories' "email eprint request" Button could do a good deal better than that. 63% of journals endorse immediate OA (no embargo) already. For the remaining 37% the IRs can provide "Almost OA" -- for just a few extra user and author keystrokes and slight delay (but not remotely comparable to a year's delay!) for the semi-automatic eprint request fulfillment.

Instead, WT "prefers" to pay for immediate OA at a high price. The price is higher than they think, because of the emulation (and non-emulation) that their bad example inspires, in place of good, sensible practice that would generate far more (Green) OA and Green OA mandates, far faster, with no loss, only gain, over the dysfunctional WT policy.

"Apparently the route to post-print deposit in PubMedCentral is not particularly easy, although I can't speak from personal experience."

My guess is that it's no harder or easier to deposit in PMC than in any other repository, central or institutional (and the fact is that it's easy and fast to deposit: you just have to be mandated to try it and then you find out). But with WT "preferring" to offer what looks to the author like the even easier option of proxy deposit (some day) by the publisher (if embargoed) and immediate proxy deposit if paid-Gold, the author never even gets to learn from personal experience how easy and fast it really is.

Another bad example to set.

"Such complexities at the deposit stage will also force authors into paying for publication, just to give themselves an easier life!"

I doubt that profoundly. If the choice really were to do a few minutes worth of keystrokes or pay out of their pockets, authors would quickly discover how easy it really was to deposit, and would save their money. But with WT offering to pay for Gold OA in their stead, I don't doubt that a number of years more will be wasted going down that dysfunctional route before it is discovered that it is both unnecessary and wasteful.

"Personally, I'd much rather see the author depositing in institutional repositories, and PubMedCentral harvesting from us."

Yes, that is indeed the optimal way, not just for PMC and WT but for the institution, which, if all funders mandated institutional deposit, would soon realize that it made sense for the institution itself to go on to mandate (institutional) deposit for all of its research output, whether funded or not.

"There have been a couple of lonely requests on discussion lists from repository managers asking whether others have deposited works to PubMedCentral on behalf of authors but no replies so far."

Why on earth should proxy deposit be offered to authors to spare them a few minutes' worth of keystrokes? If it comes to that, that's what secretaries are paid for, or student assistants. Let authors decide for themselves whether it's worth paying money to spare themselves those few extra keystrokes per paper. (I suppose some authors still pay secretaries to do the keystrokes to type their drafts in the first place, so this would just be a few more keystrokes...)

"Why might we do this on authors' behalf? Because authors recognise the importance of subject repositories far more than they do that of institutional repositories and if we can do this for them then we gain their support and understanding. Because, whether there is an institutional mandate or not, it's not right that authors should have to deposit the same article twice."

Unfortunately the above passage has conflated so many unfounded assumptions, I hardly know where to begin!

(i) The reason it's better not to do deposits on authors' behalf is that in reality deposits are fast and easy and it is absurd to fear or avoid them.

(ii) In addition, if everything continues to be done to insulate authors' groundless phobias about a few keystrokes from the simple reality of deposit, that simply makes the path to universal OA longer and more arduous (real arduousness being substituted for the notional arduousness that keeps authors at arms' length).

(iii) The only repositories that authors "recognise the importance of" are mandated repositories, regardless of whether the mandate is from their funder or their institutions (but preferably both!), and regardless of whether the repository is central or institutional -- but it had *#&% well better be just one-time, one-place deposit, otherwise the authors can and will and should revolt: and that's really what this is all about: A few minimal keystrokes, yes, but no unnecessary, redundant or profligate ones. WT did not think this through, otherwise it would have been realized at once that convergent institutional locus of deposit should be specified by both institutional and funder mandates, with automated central harvesting to whatever further loci hearts desire thereafter; the authors' fingers are not involved in that.

(iv) Doing proxy deposit on behalf of authors in order to "gain their support
and understanding"? Their support and understanding for what? So that the next time they want to deposit, they can again appeal to your free keystroke services? Where's the support and understanding in that? What's needed is mandates; that moots the need for "support and understanding."

(v) It's certainly true that "it's not right that authors should have to deposit the same article twice." That's the whole point here. We have WT and NIH to thank for the fact that we need to face this prospect at all. And it discourages the adoption of institutional mandates (and of course diminishes the probability of spontaneous institutional deposit to even below the 15% baseline).

(vi) But an institutional mandate would still remedy this, for then the author could deposit once, institutionally, and the proxy redeposit phase would be much lightened: Instead of having to do each deposit for the author, software (like SWORD) could be used to port the deposits' metadata from their IR to their secondary (central) loci. And then maybe the central deposit stipulation of WT, NIH and other such funders -- not all funders have been silly enough to emulate this dysfunctional stricture -- would die a natural death of its own accord, , eventually short-circuited by increasingly efficient software.

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Institutional Repositories at 12:54

Wednesday, December 9. 2009

Comments on Raym Crow's (2002) SPARC position paper on institutional repositories

Self-Archiving, Self-Vetting, "Overlay Journals" and "Disaggregated Models"':
Comments on 2002 SPARC Position Paper on Institutional Repositories by Raym Crow

(Note: These comments were originally posted on Sun Aug 04 2002)
The SPARC position paper, "The Case for Institutional Repositories," by Raym Crow (2002), is timely and will serve a useful purpose in mapping out for universities exactly why it is in their best interests to self-archive their research output, and how they should go about doing so.

I will only comment on a few passages, having mostly to do with the topic of "certification" (peer review) in which SPARC's message may have become a little garbled along the same lines that like-minded precursor initiatives (notably E-biomed and Scholar's Forum) have likewise been a little garbled: E-biomed: A Proposal for Electronic Publications in the Biomedical Sciences

Scholars' Forum: A New Model For Scholarly Communication

To overview the point in question very briefly:

To provide open access (i.e., free, online, full-text access) to the research output of universities and research institutions worldwide -- output that is currently accessible only by paying access-tolls to the 24,000 peer reviewed journals in which their 2.5 million annual research papers are published -- does not call for or depend upon any changes at all in the peer review system. On the contrary, it would be a profound strategic (and factual) mistake to give the research community the incorrect impression that there is or ought to be any sort of link at all between providing open access to their own research literature by self-archiving it and any modification whatsoever in the peer review system that currently controls and certifies the quality of that research.

The question of peer-review modification has absolutely nothing to do with the institutional repositories and self-archiving that the SPARC paper is advocating. The only thing that authors and institutions need to be clearly and explicitly reassured about (because it is true) is that self-archiving in institutional Eprints Archives will preserve intact that very same peer-reviewed literature (2.5 million peer-reviewed papers annually, in 24,000 peer-reviewed journals) to which it is designed to provide open access.

Hence, apart from providing these reassurances, it is best to leave the certification/peer-review issue alone!

Here is where this potentially misleading and counterproductive topic is first introduced in the SPARC paper's section on "certification":

RC: "CERTIFICATION: Most of the institutional repository initiatives currently being developed rely on user (including author) communities to control the input of content. These can include academic departments, research centers and labs, administrative groups, and other sub-groups. Faculty and others determine what content merits inclusion and act as arbiters for their own research communities. Any at the initial repository submission stage thus comes from the sponsoring community within the institution, and the rigor of qualitative review and certification will vary."

There is a deep potential ambiguity here. The SPARC paper might merely be referring here to how much, and how, institutions might decide to self-vet their own research output when it is still in the form of pre-peer-review preprints,and that would be fine:

"1.5. Distinguish unrefereed preprints from refereed postprints"

But this institutional self-vetting of whatever of its own pre-refereeing research output a university decides to make public online should on no account be described as "qualitative review and certification"! That would instead be peer review, and peer review is the province of the qualified expert referees (most of them affiliated with other institutions, not the author's institution) who are called upon formally by the editors of independent peer-reviewed journals to referee the submissions to those journals; this quality-review is not the province of the institution that is submitting the research. Self-archiving is not self-publishing, and peer-review cannot be self-administered:

"1.4. Distinguish self-publishing (vanity press) from self-archiving (of published, refereed research)"

It merely invites confusion to characterize whatever preliminary self-vetting an institution may elect to do on the contents of the unrefereed preprint sector of its Eprint Archives with what it is that journals do when they implement peer review.

Worse, it might invite the conflation of self-archiving with self-publishing, if what the SPARC paper has in mind here is not just the unrefereed preprint sector of the institutional repository, but what would be its refereed postprint sector, consisting of those papers that are certified as having met a specific journal's established quality standards after classical peer review has taken its standard course:

"What is an Eprint Archive?"

"What is an Eprint?"

"What should be self-archived?"

"What is the purpose of self-archiving?"

"Is self-archiving publication?"

It is extremely important to clearly differentiate an institution's self-vetting of the unrefereed sector of its archive from the external quality control and certification provided by refereed journals that subsequently yields the refereed sector of its archive. Nothing is gained by conflating the two:

"Peer-review reform: Why bother with peer review?"

RC: "In some instances, the certification will be implicit and associative, deriving from the reputation of the author's host department. In others, it might involve more active review and vetting of the research by the author's departmental peers. While more formal than an associative certification, this certification would typically be less compelling than rigorous external peer review. Still, in addition to the primary level certification, this process helps ensure the relevance of the repository's content for the institution's authors and provides a peer-driven process that encourages faculty participation."

These are all reasonable possibilities for the preliminary self-selection and self-vetting of an institution's unrefereed preprints. But implying that they amount to anything more than that -- by using the term "peer" for both this internal self-vetting and external peer review, and suggesting that there is some sort of continuum of "compellingness" between the two -- is not helpful or clarifying but instead leads to (quite understandable) confusion and resistance on the part of researchers and their institutions:

For, having read the above, the potential user who previously knew the refereed journal literature -- consisting of 24,000 peer-reviewed journals, 2,5 million refereed articles per year, each clearly certified with each journal's quality-control label, and backed by its established reputation and impact -- now no longer has a clear idea what literature we might be talking about here! Are we talking about providing open access to that same refereed literature, or are we talking about substituting some home-grown, home-brew in its place?

Yet there is no need at all for this confusion: As correctly noted in the SPARC paper, University Eprint Archives ("Institutional Repositories") can have a variety of contents, but prominent among them will be the university's own research output (self-archived for the sake of the visibility, usage, impact, and their resulting individual and institutional rewards, as well described elsewhere in the SPARC paper). That institutional research output has, roughly, two embryonic stages: pre-peer-review (unrefereed) preprints and post-peer-review (refereed) postprints.

Now the pre-peer-review preprint sector of the archive may well require some internal self-vetting (this is up to the institution), but the post-peer-review postprint sector certainly does not, for the "vetting" there has been done -- as it always has been -- by the external referees and editors of the journals to which those papers were submitted as preprints, and by which they were accepted for publication (possibly only after several rounds of substantive revision and re-refereeing) once the refereeing process had transformed them into the postprints.

Nor is the internal self-vetting of the preprint sector any sort of substitute for the external peer review that dynamically transforms the preprints into refereed, journal-certified postprints.

In the above-quoted passage, the functions of the internal preprint self-vetting and the external postprint refereeing/certification are completely conflated -- and conflated, unfortunately, under what appears like an institutional vanity-press penumbra, a taint that the self-archiving initiative certainly does not need, if it is to encourage the opening of access to its existing quality-controlled, certified research literature, such as it is, rather than to some untested substitute for it.

RC: It should be noted that to serve the primary registration and certification functions, a repository must have some official or formal standing within the institution. Informal, grassroots projects - however well-intentioned - would not serve this function until they receive official sanction.

Universities should certainly establish whatever internal standards they see fit for pre-filtering their pre-refereeing research before making it public. But the real filtration continues to be what it always was, namely, classical peer review, implemented and certified as it always was. This needs to be made crystal clear!

RC: " OVERLAY JOURNALS: Third-party online journals that point to articles and research hosted by one or more repositories provide another mechanism for peer review certification in a disaggregated model."

Unfortunately, the current user of the existing, toll-access refereed-journal literature is becoming more and more confused about just what is actually being contemplated here! Does institutional self-archiving mean that papers lose the quality-control and certification of peer-reviewed journals and have it replaced by something else? By what? And what is the evidence that we would then still have the same literature we are talking about here? Does institutional self-archiving mean giving up the established forms of quality control and certification and replacing them by untested alternatives?

There also seems to be some confusion between the more neutral concept of (1) "overlay journals" (OJs) (e.g., Arthur Smith, which merely use Eprint Archives for input (the online submission/refereeing of author self-archived preprints) and output (the official certification of author self-archived postprints as having been peer-reviewed, accepted and "published" by the OJ in question), but leave the classical peer review system intact; and the vaguer and more controversial notion of (2) "deconstructed journals" (DJs) on the "disaggregated model" (e.g., John W.T. Smith), in which (as far as I can ascertain) what is being contemplated is the self-archiving of preprints and their subsequent "submission" to one or many evaluating/certifying entities (some of which may be OJs, others some other unspecified kind of certifier) who give the papers their respective "stamps of approval."

"Re: Alternative publishing models - was: Scholar's Forum: A New Model...

JWT Smith has made some testable empirical conjectures, which could eventually be tested in a future programme of empirical research on alternative research quality review and certification systems. But they certainly do not represent an already tested and already validated ("certified"?) alternative system, ready for implementation in place of the 2.5 million annual research articles that currently appear in the 24,000 established refereed journals!

As such, untested speculations of this kind are perhaps a little out of place in the context of a position paper that is recommending concrete (and already tested) practical steps to be taken by universities in order to maximize the visibility, accessibility and impact of their research output (and perhaps eventually to relieve their library serials budgetary burden too).

Author/institution self-archiving of research output -- both preprints and postprints -- is a tested and proven supplement to the classical journal peer review and publication system, but by no means a substitute for it. Self-archiving in Open Access Eprint Archives has now been going on for over a decade, and both its viability and its capacity to increase research visibility and impact have been empirically demonstrated.

Substitutes for the existing journal peer review and publication system, in contrast, require serious and systematic prior testing in their own right; there is nothing anywhere near ready there for practical recommendations other than the feasibility of Overlay Journals (OJs) as a means of increasing the efficiency and speed and lowering the cost of classical peer review. Almost no testing of any other model has been done yet; there are no generalizable findings available, and there are many prima facie problems with some of the proposed models (including JWT Smith's "disaggregated" model, [DJs]) that have not even been addressed:

See the discussion (and some of the prima facie problems) of JWT Smith's model under:

"Alternative publishing models - was: Scholar's Forum: A New Model..."

"Journals are Quality Certification Brand-Names"

"Central vs. Distributed Archives"

"The True Cost of the Essentials (Implementing Peer Review)"

"Workshop on Open Archives Initiative in Europe"

In contrast, there has been a recent announcement that the Journal of Nonlinear Mathematical Physics will become openly accessible as an "overlay journal" (OJ) on the Physics Archive .

This is certainly a welcome development -- but note that JNMP is a classically peer-reviewed journal, and hence the "overlay" is not a substitute for classical peer review: It merely increases the visibility, accessibility and impact of the certified, peer-reviewed postprints while at the same time providing a faster, more efficient and economical way of processing submissions and implementing [classical] peer review online.

Indeed, Overlay Journals (OJs) are very much like the Open-Access Journals that are the target of Budapest Open Access Strategy 2.

Deconstructed/Disaggregated Journals (DJs), in contrast, are a much vaguer, more ambiguous, and more problematic concept, nowhere near ready for recommendation in a SPARC position paper.

RC: "While some of the content for overlay journals might have been previously published in refereed journals, other research may have only existed as a pre-print or work-in-progress."

This is unfortunately beginning to conflate the notion of the "overlay" journal (OJ) with some of the more speculative hypothetical features of the "deconstructed" or "disaggregated" journal (DJ):

The (informal) notion of an overlay journal is quite simple: If researchers are self-archiving their preprints and postprints in Eprint Archives anyway, there is, apart from any remaining demand for paper editions, no reason for a journal to put out its own separate edition at all: Instead, the preprint can first be deposited in the preprint sector of an Eprint Archive. The journal can be notified by the author that the deposit is intended as a formal submission. The referees can review the archived preprint. The author can revise it according to the editor's disposition letter and the referee reports. The revised draft can again be deposited and re-refereed as a revised preprint. Once a final draft is accepted, that then becomes tagged as the journal-certified (refereed) postprint.

End of story. That is an "overlay" journal (OJ), with the postprint permanently "certified" by the journal-name as having met that journal's established quality standards. The peer review is classical, as always; the only thing that has changed is the medium of implementation of the peer review and the medium of publication (both changes being in the direction of greater efficiency, functionality, speed, and economy).

A deconstructed/disaggregated journal (DJ) is an entirely different matter. As far as I can ascertain, what is being contemplated there is something like an approval system plus the possibility that the same paper is approved by a number of different "journals." The underlying assumptions are questionable:

(1) Peer review is neither a static red-light/green-light process nor a grading system, singular or multiple: The preprint does not receive one or a series of "tags." Peer review is a dynamic process of mediated interactions between an author and expert referees, answerable to an expert editor who selects the referees for their expertise and who determines what has to be done to meet the journal's quality standards -- a process during which the content of the preprint undergoes substantive revision, sometimes several rounds of it. The "grading" function comes only after the preprint has been transformed by peer review into the postprint, and consists of the journal's own ranking in the established (and known) hierarchy of journal quality levels (often also associated with the journal's citation impact factor).

It is not at all clear whether and how having raw preprints certified as approved -- singly or many times over -- by a variety of "deconstructed journals" (DJs) can yield a navigable, sign-posted literature of the known quality and quality-standards that we have currently. (And to instead interactively transform them into postprints is simply to reinvent peer review.)

(2) Even more important: Referees are a scarce resource. Referees sacrifice their precious research time to perform this peer-reviewing duty for free, normally at the specific request of the known editor of a journal of known quality, and with the knowledge that the author will be answerable to the editor. The result of this process is the navigable, quality-controlled refereed research literature we have now, with the quality-grade certified by the journal label and its established reputation.

It is not at all clear (and there are many prima facie reasons to doubt) that referees would give of their time and expertise to a "disaggregated" system to provide grades and comments on raw preprints that might or might not be graded and commented upon by other (self-selected? appointed?) referees as well, and might or might not be responsive to their recommendations. Nor is it clear that a disaggregated system would continue to yield a literature that was of any use to other users either.

Classical peer review already exists, and works, and it is the fruits of that classical peer review that we are talking about making openly accessible through self-archiving, nothing more (or less)! Journals (more specifically, their editorial boards and referees) are the current implementers of peer review. They have the experience, and their quality-control "labels" (the journal-names) have the established reputations (and citation impact factors) on which such "metadata" tags depend for their informational value in guiding users. There is no need either to abandon journals or to re-invent them under another name ("DJ").

A peer-reviewed journal, medium-independently, is merely a peer-review service provider and certifier. That is what they are, and that is what they will continue to be. Titles, editorial boards and their referees may migrate, to be sure. They have done so in the past, between different toll-access publishers; they could do so now too, if/when necessary, from toll-access to open-access publishers. But none of this involves any change in the peer review system; hence there should be no implication that it does.

(JWT Smith also contemplates paying referees for their services, another significant and untested departure from classical peer review, with the potential for bias and abuse -- if only there were enough money available to make it worth referees' while, which there is not! At realistic rates, offering to pay a referee for stealing his research time to review a paper would risk adding insult to injury.)

So there is every reason to encourage institutions to self-archive their research output, such as it is, before and after peer review. But there is no reason at all to link this with speculative scenarios about new publication and/or peer review systems, which could well put the very literature we are trying to make more usable and used at risk of ceasing to be useful or usable to anyone.

The message to researchers and their institutions should be very clear:

The self-archiving of your research output, before (preprints) and after (postprints) peer-reviewed publication will maximize its visibility, usage, and impact, with all the resulting benefits to you and your institution. Self-archiving is merely a supplement to the existing system, an extra thing that you and your institution can do, in order to enjoy these benefits. You need give up nothing, and nothing else need change.

In addition, one possible consequence, if enough researchers and their institutions self-archive enough research long enough, is that your institutional libraries might begin to enjoy some savings on their serials expenditures, because of subscription cancellations. This outcome is not guaranteed, but it is a possible further benefit, and might in turn lead to further restructuring of the journal publication system under the cancellation pressure -- probably in the direction of cutting costs and downsizing to the essentials, which will probably reduce to just providing peer review alone. The true cost of that added value, per paper, will in turn be much lower than the total cost now, and it will make most sense to pay for it out of the university's annual windfall subscriptions savings as a service, per outgoing paper, rather than as a product, per incoming paper, as in toll-access days. This outcome too would be very much in line with the practice of institutional self-archiving of outgoing research that is being advocated by the SPARC position paper.

The foregoing paragraph, however, only describes a hypothetical possibility, and need not and should not be counted as among the sure benefits of author/institution self-archiving -- which are, to repeat: maximized visibility, usage, and impact for institutional research output, resulting from maximized accessibility.

RC: "As a paper could appear in more than one journal and be evaluated by more than one refereeing body, these overlays would allow the aggregation and combination of research articles by multiple logical approaches - for example, on a particular theme or topic (becoming the functional equivalent of anthology volumes in the humanities and social sciences); across disciplines; or by affiliation (faculty departmental bulletins that aggregate the research of their members)."

Here the speculative notion of substituting "disaggregated journals" (DJs) for classical peer review is being conflated with the completely orthogonal matter of collections and alerting: An open-access online research literature can certainly be linked and bundled and recombined in a variety of very useful ways, but this has nothing whatsoever to do with the way its quality is arrived at and certified as such. Until an alternative has been found, tested and proven to yield at least comparable sign-posted quality, the classical peer review system is the only game in town. Let us not delay the liberation of its fruits from access-barriers still longer by raising the spectre of freeing them not only from the access-tolls but also from the self-same peer review system that (until further notice) generated and certified their quality!

"Rethinking "Collections" and Selection in the PostGutenberg Age"

RC: "Such journals exist today-for example, the Annals of Mathematics overlay to arXiv and Perspectives in Electronic Publishing, to name just two-and they will proliferate as the volume of distributed open access content increases."

The Annals of Mathematics is an "overlay" journal (OJ) of the kind I described above, using classical peer review. It is not an example of the "disaggregated" quality control system (DJ).

Perspectives in Electronic Publishing, in contrast, is merely a collection of links to already published work.

It does not represent any sort of alternative to classical peer review and journal publication.

RC: "Besides overlay journals pointing to distributed content, high-value information portals - centered around large, sophisticated data sets specific to a particular research community - will spawn new types of digital overlay publications based on the shared data.

"Journals that are overlays to institutional research repositories are merely certifying that papers bearing their tag have undergone their peer-review and have met their established quality standards. This has nothing to do with alternative forms of quality control, disaggregated or otherwise.

Post hoc collections (link-portals) have nothing to do with quality control either, although they will certainly be valuable for other purposes.

RC: "Regardless of journal type, the basis for assessing the quality of the certification that overlay journals provide differs little from the current journal system: eminent editors, qualified reviewers, rigorous standards, and demonstrated quality."

Not only does it not differ: Overlay Journals (OJs) will provide identical quality and standards -- as long as "overlay" simply means having the implementation of peer review (and the certification of its outcome) piggy-back on the institutional archives, as it should.

Alternative forms of quality control (e.g., DJs), on the other hand, will first have to demonstrate that they work.

And neither of these is to be confused with the post-hoc function of aggregating online content, peer-reviewed or otherwise.

This should all be made crystal clear in the SPARC paper, partly by stating it in a clear straighforward way, and partly by omitting the speculative options that only cloud the picture needlessly (and have nothing to do with institutional self-archiving and its rationale [open access], but simply risk confusing and discouraging would-be self-archivers and their institutions).

RC: "In addition to these analogues to the current journal certification system, a disaggregated model also enables new types of certification models. Roosendaal and Geurts have noted the implications of internal and external certification systems."

Please, let us distinguish the two by calling "internal certification" pre-certification (or "self-certification") so as not to confuse it with peer review, which is by definition external (except in that happy but rare case where an institution happens to house enough of the world's qualified experts on a given piece of research not to have to consult any outside experts).

A good deal of useful pre-filtering can be done by institutions on their own research output, especially if the institution is large enough. (CERN has a very rigorous internal review system that all outgoing research must undergo before it is submitted to a journal for peer review.)

But, on balance, "internal certification" rightly raises the spectre of vanity press publication. Nor is it a coincidence that when universities assess their own researchers for promotion and tenure, they tend to rely on the external certification provided by peer reviewed journals (weighted sometimes by their impact factors) rather than just internal review. The same is true of the external assessors of university research output.

So, please, let us not link the very desirable and face-valid goal of maximizing universities' research visibility and research impact through open access provided by institutional self-archiving with the much more dubious matter of institutional self-certification.

RC: "Certification may pertain at the level of internal, methodological considerations, pertinent to the research itself - the standard basis for most scholarly peer review. Alternatively, the work may be gauged or certified by criteria external to the research itself - for example, by its economic implications or practical applicability. Such internal and external certification systems would typically operate in different contexts and apply different criteria. In a disaggregated model, these multiple certification levels can co-exist."

This is all rather vague, and somewhat amateurish, and would (in my opinion) have been better left out of this otherwise clear and focussed call for institutional self-archiving of research output.

And the idea of expecting referees to spend their precious time refereeing already-refereed and already-certified (i.e., already-published) papers yet again is unrealistic in the extreme, especially considering the growing number of papers, the scarcity of qualified expert referees (who are otherwise busy doing the research itself), and the existing backlogs and delays in refereeing and publication.

Besides, as indicated already, refereeing is not passive tagging or grading: It is a dynamic, interactive, and answerable process in which the preprint is transformed into the accepted postprint, and certified as such. Are we to imagine each of these papers being re-written every time they are submitted to yet another DJ?

There is a lot to be said for postpublication revision and updating of the postprints ("post-postprints") in response to postpublication commentary (or to correct substantive errors that come to light later), but it only invites confusion to call that "disaggregated journal publication." The refereed, journal-certified postprint should remain the critical, canonical, scholarly and archival milestone that it is, perpetually marking the fact that that draft successfully met that journal's established quality standards. Further iterations of this refereeing/certification process make no sense (apart from being profligate with scarce resources) and should in any case be tested for feasibility and outcome before being recommended!

RC: "To support both new and existing certification mechanisms, quality certification metadata could be standardized to allow OAI-compliant harvesting of that information. This would allow a reader to determine whether there is any certificationinformation about an article, regardless of where the article originated or where it is discovered."

Might I venture to put this much more simply (and restrict it to the refereed research literature, which is my only focus)? By far the most relevant and informative "metadatum" certifying the information in a research paper is the JOURNAL-NAME of the journal in which it was published (signalling, as it does, the journal's established reputation, quality level, and impact factor)! (Yes, the AUTHOR-NAME, and the AUTHOR-INSTITUTION metadata-tags may be useful sometimes too, but those cases do not, as they say, "scale" -- otherwise style="font-style: italic;">self-certification would have replaced peer review long ago. COMMENT-tags would be welcome too, but caveat emptor.)

"Peer Review, Peer Commentary, and Eprint Archive Policy"

Please let us not lose sight of the fact that the main purpose of author/institution self-archiving in institutional Eprint Archives is to maximize the visibility, uptake and impact of research output by maximizing its accessibility (by provising open access). It is not intended as an experimental implementation of speculations about untested new forms of quality control! That would be to put this all-important literature needlessly at risk (and would simply discourage researchers and their institutions from self-archiving it at all).

There is a huge amount of further guiding information that can be derived from the literature to help inform navigation, search and usage. A lot of it will be digitometric analysis based on usage measures such as citation, hits, and commentary

But none of these digitometrics should be mistaken for certification, which, until further notice, is a systematic form of expert human interaction and judgement called peer review.

Harnad, S. & Carr, L. (2000) Integrating, Navigating and Analyzing Eprint Archives Through Open Citation Linking (the OpCit Project). Current Science 79(5): 629-638.

RC: "Depending on the goals established by each institution, an institutional repository could contain any work product generated by the institution's students, faculty, non-faculty researchers, and staff. This material might include student electronic portfolios, classroom teaching materials, the institution's annual reports, video recordings, computer programs, data sets, photographs, and art works-virtually any digital material that the institution wishes to preserve. However, given SPARC's focus on scholarly communication and on changing the structure of the scholarly publishing model, we will define institutional repositories here-whatever else they might contain-as collecting, preserving, and disseminating scholarly content. This content may include pre-prints and other works-in-progress, peer-reviewed articles, monographs, enduring teaching materials, data sets and other ancillary research material, conference papers, electronic theses and dissertations, and gray literature."

This passage is fine, and refocusses on the items of real value in the SPARC position paper.

RC: "To control and manage the accession of this content requires appropriate policies and mechanisms, including content management and document version control systems. The repository policy framework and technical infrastructure must provide institutional managers the flexibility to control who can contribute, approve, access, and update the digital content coming from a variety of institutional communities and interest groups (including academic departments, libraries, research centers and labs, and individual authors). Several of the institutional repository infrastructure systems currently being developed have the technical capacity to embargo or sequester access to submissions until the content has been approved by a designated reviewer. The nature and extent of this review will reflect the policies and needs of each individual institution, possibly of each participating institutional community. As noted above, sometimes this review will simply validate the author's institutional affiliation and/or authorization to post materials in the repository; in other instances, the review will be more qualitative and extensive, serving as a primary certification."

This is all fine, as long as it is specified that what is at issue is institutional pre-certification or self-certification of its unrefereed research (preprints).

For peer-reviewed research the only institutional authentication required is at most that the AUTHOR-NAME and JOURNAL-NAME are indeed as advertised! (The integrity of the full text could be vetted too, but I'm inclined to suggest that that would be a waste of time and resources at this point. What is needed right now is that institutions should create and fill their own Eprint Archives with their research output, pre- and post-refereeing, immediately. The "definitive" text, until journals really all become "overlay" journals, is currently in the hands of the publishers and subscribing libraries. For the time being, let authors "self-certify" their refereed, published texts as being what they say they are; let's leave worrying about more rigorous authentication for later. For now, the goal should be to self-archive as much research output as possible, as soon as possible, with minimal fuss. The future will take care of itself.

RC: "Institutional repository policies, practices, and expectations must also accommodate the differences in publishing practices between academic disciplines. The early adopter disciplines that developed discipline-specific digital servers were those with an established pre-publication tradition. Obviously, a discipline's existing peer-to-peer communication patterns and research practices need to be considered when developing institutional repository content policies and faculty outreach programs. Scholars in disciplines with no prepublication tradition will have to be persuaded to provide a prepublication version; they might fear plagiarism or anticipate copyright or other acceptance problems in the event they were to submit the work for formal publication. They might also fear the potential for criticism of work not yet benefiting from peer review and editing. For these non-preprint disciplines, a focus on capturing faculty post-publication contributions may prove a more practical initial strategy."

Agreed. And here are some prima facie FAQs for allaying each of these by now familiar prima facie fears:

Authentication

Corruption

Certification

Evaluation

Peer Review

Copyright

Plagiarism

Priority

Tenure/Promotion

Legality

Publisher Agreement

RC: "Including published material in the repository will also help overcome concerns, especially from scholars in non-preprint disciplines, that repository working papers might give a partial view of an author's research."

Indeed. And that is the most important message of all -- and the primary function of institutional eprint archives: to provide open access to all peer-reviewed research output!

RC: "Therefore, including published material, while raising copyright issues that need to be addressed, should lower the barrier to gaining non-preprint traditions to participate. Where authors meet traditional publisher resistance to the self-archiving rights necessary for repository posting, institutions can negotiate with those publishers to allow embargoed access to published research."

Fine.

RC: "While gaining the participation of faculty authors is essential to effecting an evolutionary change in the structure of scholarly publishing, early experience suggests better success when positioning the repository as a complement to, rather than as a replacement for, traditional print journals."

Not only "positioning" it as a complement: Clearly proclaiming that a complement, not a replacement, is exactly what it is! Not just with respect to the relatively trivial issue of on-paper vs. on-line, but also with respect to the much more fundamental one, about journal peer review (vide supra). Institutional self-archiving is certainly no substitute for external peer review. (This is is stated clearly in some parts of the SPARC paper, but unfortunately contradicted, or rendered ambiguous, in other parts.)

RC: "This course partially obviates the most problematic objection to open access digital publishing: that it lacks the quality and prestige of established journals."

This is a non-sequitur and a misunderstanding: The quality and prestige come from being certified as having met the quality standards of an established peer-reviewed journal. This has nothing whatsoever to do with the medium (on-paper or on-line), nor with the access system (toll-access or open-access); and it certainly cannot be attained by self-archiving unrefereed preprints only. The papers must of course continue to be submitted to peer-reviewed journals for refereeing, revision, and subsequent certification.

RC: "This also allows repository proponents to build a case for faculty participation based on the primary benefits that repositories deliver directly to participants, rather than relying on secondary benefits and on altruistic faculty commitment to reforming a scholarly communications model that has served them well on an individual level."

I could not follow this. The primary benefits of self-archiving are the maximization of the visibility, uptake and impact of research output by maximizing its accessibility (through open-access). Researchers certainly will not, and should not, self-archive in order to support untested new "certification" conjectures, nor even to ease their institutions' serials budgets. The appeal must be straight to researchers' self-interest in promoting their own research.

RC: "Additionally, value-added services such as enhanced citation indexing and name authority control will allow a more robust qualitative analysis of faculty performance where impact on one's field is a measurement. The aggregating mechanisms that enable the overall assessment of the qualitative impact of a scholar's body of work will make it easier for academic institutions to emphasize the quality, and de-emphasize the quantity, of an author's work.53 This will weaken the quantity-driven rationale for the superfluous splintering of research into multiple publication submissions. The ability to gauge a faculty member's publishing performance on qualitative rather than quantitative terms should benefit both faculty and their host institutions."

All true, but strategically, it is best to stress maximization of existing performance indicators, rather than hypothetical new ones:

Harnad, S. (2001) "Research Access, Impact and Assessment are linked." Times Higher Education Supplement 1487: p. 16.

RC: "Learned society publishers are for the most part far less aggressive in exploiting their monopolies than their for-profit counterparts. Even so, most society publishing programs, even in a not-for-profit context, often contribute significantly to covering an organization's operating expenses and member services. It is not surprising, then, that proposals advocating institutional repositories and other open access dissemination of scholarly research generate anxiety, if not outright resistance, amongst society publishers. While one hopes that societies adopt the broadest perspective possible in serving the needs of their members-including the broadest possible access to the scholarly research in the field-it is unlikely that societies will trade their organizations' solvency for the greater good of scholarship. It is important, therefore, to review how society publishers can continue to operate in an environment of institutional repositories and other open access systems."

Once the causal connection between access and impact is cleary
demonstrated to the research community, it is highly unlikely that they will knowingly choose to continue to subsidise their Learned Societies' "good works" with the lost impact of their own work, by continuing to hold it hoastage to impact-blocking access-tolls: Societies will need to find better ways to support their good works.

RC: "Some suggest that institutional repositories, pre-print servers, and electronic aggregations of individual articles will undermine the importance of the journal as a packager of articles. However, institutional repositories and other open access mechanisms will only threaten the survival of scholarly journals if they defeat the brand positions of the established society journals and if individual article impact metrics replace journal impact factors in academic advancement decisions."

Most of the above is not true, and hence better left unsaid.

It is quite possible (and hence should not be denied) that author/institution self-archiving of refereed research may eventually necessitate downsizing by publishers (to become peer-review/certification service-providers :

"Hypothetical Sequel"

"Downsizing"

But none of this has anything to do with journal- vs author- impact metrics! The ISI's Web of Science has already made it possible (and very useful) for institutions and funding agencies) to use either journal or author citation impact metrics for assessment, whichever is more useful and informative, and it is very likely that weighting publications only by their journal-impact will prove a much blunter instrument than weighting them by the paper's and/or author's impact:.

Harnad, S., Carr, L., Brody, T. & Oppenheim, C. (2003) Mandated online RAE CVs Linked to University Eprint Archives: Improving the UK Research Assessment Exercise whilst making it cheaper and easier. Ariadne 35

But once the institutional Eprint Archives are up and filled, far richer and more sensitive digitometric measures of impact and usage are waiting to be devised and tested on this vast corpus. A taste is already available from citebase and its correlator.

See the bibliography of ongoing research on these new digitometric performance indicators.

RC: "On the first point, journal brand reputation will, for the foreseeable future, continue to be integral to the assessment of article and author quality."

For the reader/user/navigator of the literature, certainly. But more sensitive measures are developing too, for the evaluator, funder and employer. The all-important JOURNAL-NAME tag, and the established quality level and impact to which it attests will continue to be indispensable sign-posts, but a great deal more will be built on top of them, once the entire refereed journal literature (24K journals) is online and open-access.

RC: "Market-aware journals with prominent editorial boards and well-established publishing histories should be able to maintain their prestige, even with a proliferation of article-based aggregations. As to the second point, while new metrics will evolve that demonstrate the quantitative impact of individual articles, rigorous peer review will continue to provide value. Even after individual article impact analysis becomes widespread and accepted by academic tenure committees, stringent refereeing standards will continue to play a central role in indicating quality.

"Correct, and mainly because peer review is the cornerstone of it all.

RC: "Learned societies have long-standing relationships with their members and they should be able to act as focal points for the research communities they represent. While society dues typically include a journal subscription, society members also enjoy other benefits of membership-and, presumably, additional value-beyond the journal subscription itself. Societies, therefore, provide community-supporting services to justify their members' dues besides the value allocated to the journal subscription. While a commercial publisher would find it difficult to charge a subscription fee for a journal freely available online, society publishers-by repositioning the benefits of membership-might well prove able to allow journal article availability via open access repositories without experiencing substantial membership cancellations or revenue attrition."

In other words, members of learned societies may still be willing to pay membership dues to support their societies' "good works." But there is no need to call these dues "subscriptions"!

And the cost of peer review itself can be covered very easily out of institutional subscription savings, if and when it becomes necessary.

RC: Given the extent of government and private philanthropic foundation funding for academic research, especially in the sciences, such funding agencies have a vested interest in broadening the dissemination of scientific research. There are several mechanisms by which government and private funding agencies could help to achieve this broadened dissemination. It has been suggested that government and foundation research grants could be written to include subsidies for author page charges and other input-side fees to support open access business models. Such stipulations would help effect change in those disciplines, primarily in the sciences, where author page charges are the norm. Obviously, such subsidies would be less effective in disciplines where input-side models bear the stigma of vanity publishing; still, over time, this resistance could be overcome.

If/when open-access prevails enough to reduce publisher income, it will at the same time increase institutional savings (from cancelled subscriptions). As peer review costs much less than the whole of what journal publishers used to do, it can easily be paid for, at the author/institution end, as a service cost for outgoing research instead of as a product cost for incoming research as it is now, out of just a portion of institutions' annual windfall savings, as indicated below:

RC: "ECONOMICALLY: The burden of scholarly journal costs on academic libraries has been well documented. While the variety of institutional contexts and potential implementations make it difficult to project institutional repository development and operational costs with any precision, the evidence so far suggests that the resources required would represent but a fraction of the journal costs that libraries now incur and over which they have little control."

And that is mainly because peer review alone -- which will be journal publishers' only remaining essential service if and when all journal publication becomes all open-access publication -- costs far less than what journal subscription/license tolls used to cost. The per-paper archiving cost, distributed over the research institutions that generate the outgoing papers, is negligible, compared to what it cost for incoming papers in the toll-based system.

"The True Cost of the Essentials (Implementing Peer Review)"

RC: "Several institutions have applied the e-prints self-archiving software to implement institutional repositories. Developed at the University of Southampton, the free eprints.org self-archiving software now comes configured to run an institutional pre-prints archive. The generic version of e-prints is fully interoperable with all the OAI Metadata Harvesting Protocol."

Not an institutional pre-prints archive: An institutional Eprints Archive. (Eprints = preprints + postprints)

RC: "Universities that have implemented e-prints solutions include Cal Tech, the University of Nottingham, University of Glasgow, and the Australian National University. The participants in all these programs have described their experiences, providing practical insights that should benefit others contemplating an OAI-compliant e-prints implementation."

See the CalTech review of their experience with eprints for SPARC.

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Institutional Repositories at 09:41 | Comments (0) | Trackbacks (0)

(Page 1 of 1, totaling 3 entries)

Entries from December 2009

Saturday, December 12. 2009

Conflating OA Repository-Content, Deposit-Locus, and Central-Service Issues

Where to Mandate Deposit: Proxy Deposit and the "Denominator Fallacy"

Wednesday, December 9. 2009

Comments on Raym Crow's (2002) SPARC position paper on institutional repositories

EnablingOpenScholarship (EOS)

Federal Research Public Access Act (FRPAA)

Alliance for Taxpayer Access (ATA)

Creative Commons License:

Quicksearch

Syndicate This Blog

Materials You Are Invited To Use To Promote OA Self-Archiving:

Archives

Calendar

Categories

Blog Administration

Statistics

Top Referrers

Syndicate This Blog