[see also PART I and PART 0]
SUMMARY: The concept underlying the
OAI metadata harvesting protocol is that local, distributed,
content-provider sites each provide their own content and global
service-provider sites harvest that content and provide global services over it, such as indexing, search, and other added values. (
This is not a symmetric process. It does not make sense to think of the individual content-providers as "harvesting" their own content (back) from global service-providers.)
The question is accordingly whether
OA deposit mandates should be (1)
convergent, with both institutional and funder mandates requiring deposit in the author's own OA
Institutional Repository (IR), for harvesting by global overlay OA services and collections (such as
PubMed Central, PMC) or (2)
divergent, requiring authors to deposit all over the map, locally or distally, possibly multiple times, depending on field and funding. It seems obvious that
coordinated, convergent IR deposit mandates from both institutions and funders will bring universal OA far more surely and swiftly than needless and counterproductive divergence.
In the interests of a swift, seamless, systematic, global transition to universal OA, NIH should accordingly make
one tiny change (entailing no loss at all in content or functionality) in its otherwise invaluable, historic, and much-imitated mandate:
NIH should mandate IR deposit and harvest to PMC from there.
The spirit of the
Congressional directive that publicly funded research should be made publicly accessible online, free for all, is fully met once everyone, webwide, can click on the link to an item whose metadata they have found in PMC, and the article instantly appears, just as if they had retrieved it via Google, regardless of whether the item's URL happens to be in an IR or in PMC itself.
A possible reason the NIH mandate took the divergent form it did may have been
a conflation of access archiving with preservation archiving: But the version that NIH has (rightly) stipulated for OA deposit (each "
investigator's... electronic version of their final, peer-reviewed manuscripts upon acceptance for publication") is
not even the draft that is in the real need of preservation; it is just a supplementary copy, provided for access purposes: The definitive version, the one that really stands in need of preservation, is not this author-copy but the
publisher's official proprietary version of record.
For
preservation, the definitive document needs to be deposited in an
archival depository (preferably
several, for safe-keeping, updating and migration as technology evolves), not an OA collection like PMC.
But that essential archival deposit/preservation function has absolutely nothing to do with either the author or with OA.
Peter Suber: "At the moment, I see two conflicting APA statements and no evidence that either statement [2002 or 2008] took the other into account. So I'm still waiting for a definitive clarification from the APA. But as I say, if the APA reaffirms the 2002 policy to allow no-fee, no-embargo self-archiving to IRs, then I will applaud it."
That will shortly sort itself out.
[See APA update, which appeared after this posting. Peter has since responded to that update too. The only point to add is that Stuart Shieber's concern about a remaining ambiguity in yet another APA document will no doubt likewise be resolved in the same way. (Stuart was the architect of Harvard FAS's institutional OA mandate and has since been appointed director of Harvard's newly formed Office for Scholarly Communication.)]
It seems obvious to me that the only coherent resolution is that APA's
2002 Green OA policy takes precedence over the contradictory passages in APA's
2008 PMC addendum. It would be arbitrary bordering on dementia to declare that:
"Our policy is that any APA author may self-archive their own refereed final draft in their own IR for free as long they are not mandated to do so by NIH; but if they are mandated to do so by NIH, then they must pay us $2500 to do it!"
I predict that the proposed APA policy will first be:
"All we meant was that, as before, any APA author may self-archive their own refereed final draft in their own IR for free, but depositing APA's proprietary published version in PMC will cost $2500."
And then they will back down from the surcharge altogether. (I do have a bit of a
track-record for correctly second-guessing APA policy!)
Peter Suber: "However, if the APA retains the "deposit fee" for NIH-funded authors, then I will continue to criticize it. The APA will still be charging for green OA, which is utterly unnecessary."
Do continue to criticize it, Peter, but please make sure the criticism is on target: As long as APA authors are free to provide green OA by depositing in their own IRs, APA can definitely not be said to be "charging for green OA" if APA charges authors for depositing in PMC (any more than I can be said to be charging for water if I say "water is free but bring your own container" and you insist on water in a container).
The $2500 fee is indeed absurd, but that absurdity (and a many other counterproductive consquences) would be completely remedied by NIH's simply dropping its supererogatory requirement to deposit directly in PMC, and harvesting the metadata from the IRs instead. A central collection like PMC is just that: a collection. It is sufficient for such collections to harvest the metadata (as Google does) and to link to the full-text where it is actually deposited, i.e., the IR of the institution it came from.
Peter Suber: "[APA] will still fail to deliver immediate OA, or OA to the published edition, which fee-based [Gold or optional-Gold] OA journals always deliver in exchange for their fees."
You mean the publisher's proprietary version? But even the NIH mandate is only requiring deposit of the author's final refereed draft, not the publisher's proprietary version:
The NIH Public Access Policy implements Division G, Title II, Section 218 of PL 110-161 (Consolidated Appropriations Act,2008). The law states:
The Director of the National Institutes of Health shall require that all investigators funded by the NIH submit or have submitted for them to the National Library of Medicine’s PubMed Central an electronic version of their final, peer-reviewed manuscripts upon acceptance for publication, to be made publicly available no later than 12 months after the official date of publication: Provided, That the NIH shall implement the public access policy in a manner consistent with copyright law.
I also think you may be equating the $2500 fee with a (
hybrid)
optional-Gold OA fee (from a non-Green publisher such as
ACS). But it is not that. APA's is a PMC deposit fee, from a Green publisher. (There is no relevant category for a requirement to deposit in a 3rd-party CR, because it is arbitrary to have to do so, and has nothing to do with OA itself, which APA authors can already provide via Green OA in their own IRs.)
Moreover, to heap absurdity upon absurdity, we both know, Peter, that (1) not only does it not matter one bit, for OA accessibility to one and all, webwide, whether a document's locus is an IR or a CR, but (2) if and when
all of OA's target content is made OA, one way or the other, then the distinction between 1st-party (author-institution), 2nd-party (publisher) and 3rd party (PMC, UKPMC, EuroPMC, Google, or any other CR) archiving becomes irrelevant, the game is over, universal OA has at last arrived, and all these trivial locus and party details as well as this absurd talk of deposit surcharges becomes moot.
The problem is with first
reaching that universal OA, which is already long, long overdue (after many, many false starts, including a prior one by NIH itself, 3 years ago, which elicited a compliance rate below 4%, less than a third of the
global average for spontaneous -- i.e.,
unmandated -- self-archiving.)
And
coordinated, convergent IR deposit mandates -- funder mandates complementing institutional mandates -- will get us there far more surely and swiftly than the needless and counterproductive divergence we have imposed on ourselves by not thinking the PMC locus stipulation through in advance (or fixing it as it becomes more and more apparent that it creates unanticipated and unnecessary problems).
Peter Suber: "If the APA reaffirms its 2002 green policy, then NIH-funded authors could bypass the deposit fee when self-archiving to their IRs. But they couldn't bypass the fee when self-archiving to PMC, and they are bound by the NIH policy to deposit in PMC (or have their journal do so for them)."
Correct, but isn't this reasoning a bit circular, if not fatalistic? Which one is cluttering the path to universal OA (now that we have the invaluable NIH mandate)? APA, which blesses OA self-archiving in the author's own OA IR, for free, or NIH, which (unnecessarily) insists on mandating
more than "merely" OA?
Would it not be better for NIH to think it through, and then -- patiently, in the interests of a swift, seamless, systematic, global progression to universal OA -- make in its otherwise invaluable, historic, and much-imitated mandate the one tiny change that (with no loss at all in content or functionality) will create the optimal conditions for a full-scale transition to universal OA, rather than only (the NIH/PMC) part of it?
Let NIH mandate
IR deposit and harvest from there.
Peter Suber: "Stevan hopes that policies like the APA's will pressure the NIH to drop this requirement and allow deposits in an IR to suffice. But even if that ought to happen, it won't happen soon and very likely won't happen at all. One reason is simply that the requirement to deposit in PMC was mandated by Congress. The NIH undoubtedly supports the Congressional directive, but it's not an in-house policy decision that the agency is free to reverse at will."
Deposits in IRs can be harvested into PMC. The issue here is merely the locus of the point of direct deposit.
Does anyone imagine that the spirit of the Congressional directive -- to the effect that publicly funded research should be made publicly accessible online, free for all -- would
not be fully met once everyone, webwide, can click on the link to an item whose metadata they have retrieved from PMC, and the article instantly appears, just as if they had retrieved it via Google, but the item's URL happens to be in an IR rather than in PMC!
Or
are OA self-archiving issues being conflated with preservation archiving issues here (yet again, as so often happens, and inevitably at OA's expense)? If so, the preservation of
what: "final, peer-reviewed manuscripts"?
Access Archiving or Preservation Archiving? One discerns the dead hand of digital preservationists here, pushing their worthy but distinct agenda, oblivious to the fact that the content they seek to preserve is mostly not even OA yet, and that the version that NIH has (rightly) stipulated for OA deposit (each "investigator's... electronic version of their final, peer-reviewed manuscripts upon acceptance for publication") is not even the draft that is in the real need of preservation, but just a supplementary copy, provided for access purposes: The definitive version, the one that really stands in need of preservation, is not this author-copy but the original itself: the publisher's official proprietary version of record. But is it not crucial, here especially, to raise the fundamental question: Is the NIH mandate an access mandate or is it a preservation mandate? For preservation, one needs to deposit a (digital and analog) original in an archival depository (preferably several, for safe-keeping, updating and migration as technology evolves), not an OA collection like PMC. That essential archival deposit/preservation function has absolutely nothing to do with either the author or with OA, and APA would certainly have no problem with a digital deposit requirement like that...
Peter Suber: "But should Congress and the NIH prefer PMCs to IRs? Maybe, maybe not. I see good arguments on both sides."
For OA functionality, the locus of deposit makes zero difference. For preservation, OA is beside the point and unnecessary. But for OA content-provision itself -- and not just for NIH-funded content, but for all of OA's target content, across all disciplines, institutions and nations -- locus of deposit matters enormously. There's no functionality without content. And I know of
no good argument at all in favor of institution-external direct deposit, insofar as OA content-provision is concerned; only
a lot of good arguments against it.
Peter Suber: "But they are irrelevant here because (1) the APA deposit fee would still [be] unnecessary"
Why is it just APA's absurd $2500 fee for PMC deposit that is singled out as being unnecessary (given that the APA is Green on free OA IR deposit): Is NIH's gratuitous stipulation of PMC deposit not likewise unnecessary (for OA)?
(This question is all the more germane given that the global transition to universal OA stands to benefit a lot more from NIH's dropping its gratuitous (and alas much imitated) deposit-locus stipulation than from APA's dropping its absurd bid for a PMC deposit fee.)
Peter Suber: "(2) there's no evidence that the APA was motivated, as Stevan is, to protest the preference for PMC --as opposed to (say) mandatory OA."
But I never said the APA was motivated to protest the preference for PMC! That really would be absurd. I am certain that APA (and every other non-OA publisher) is none too thrilled about either author self-archiving or mandatory OA, anywhere, in any form!
But APA nevertheless did the responsible thing, and bit the bullet on formally endorsing institutional self-archiving. There's no (OA) reason they should have to bite it on institution-external, 3rd-party archiving in PMC too (even though the distinction will eventually be mooted by universal OA) -- though the response of the OA community, if directed, myopically, at APA alone, and not NIH, will no doubt see to it that they will.
Frankly, I think APA just saw an opportunity to try to make a buck, and maybe also to put the brakes on an overall process that they saw as threatening to their current revenue streams. Can't blame them for thinking that; it may turn out to be true. But as long as they're Green, they're "gold," as far as OA is concerned (though, to avoid conflicting terminology, let us just say they are "
on the side of the angels").
Peter Suber: "(For the record, my position is close to Stevan's: institutional and disciplinary repositories should harvest from one another; that would greatly lower the stakes in the question where an OA mandate should require initial deposit; if we got that far, I'd be happy to see a policy require deposit in IRs.)"
I'm afraid I can't quite follow Peter's reasoning here:
The issue is whether deposit mandates should be
convergent -- requiring all authors to deposit in their own OA IRs, for harvesting by global overlay OA services and collections therefrom -- or
divergent, requiring authors to deposit all over the map, possibly multiply, depending on field and funding, possibly necessitating "reverse-harvesting," with each institution's software having to trawl the web, looking to retrieve its own institutional output, alas deposited institution-externally.
(That last is not really "harvesting" at all; rather, it involves a functional misunderstanding of the very concept of harvesting: The
OAI concept is that there are local
content-providers and global
service-providers. Content-providers are local and distributed, each providing its own content -- in this case, institutional IRs. Then there are service-providers, who harvest that content [or just the content's metadata and URL] from the distributed, interoperable content-providers, and provide global services on it, such as indexing, search, and other added values.
This is not a symmetric process. It does not make sense to think of the content-providers as "harvesting" their own content (back) from the service-providers! Another way to put this is that -- although it was not evident at the time -- OAI-interoperability really meant the end of the need for "central repositories" (CRs) for direct deposit. Now there would just be central collections (services), harvested from distributed local content-providers. No need to deposit distally. And certainly no sense in depositing distally only to "harvest" it back home again! Institutional content-provision begins and ends with the institution's own local IR; the rest is just global, webwide harvesting and service-provision.)
Peter Suber: "Stevan does call the deposit fee absurd. So we agree on that as well. But he adds that the NIH preference for PMC over IRs "reduced us to this absurdity". I'm afraid that's absurd too. If the NIH preference for PMC somehow compelled publishers to respond with deposit fees, then we'd see many of them. But in fact we see almost none."
(1) Of course APA's $2500 deposit fee is absurd. But -- given that APA is Green on OA, and given the many reasons why convergent IR deposit, mandated by institutions as well as funders, not only makes more sense but is far more likely to scale up, coherently and systematically, to universal OA across disciplines, institutions and nations than divergent willy-nilly deposit of institutional content here, there and everywhere -- I welcome this absurd outcome (the $2500 PMC deposit fee) and hope the
reductio ad absurdum it reveals helps pinpoint (and fix) the real source of the absurdity, which is not APA's wistful surcharge, but NIH's needless insistence on direct deposit institution-externally in PMC.
(2) I have no idea whether the OA community's hew and cry about the $2500 APA surcharge for PMC deposit will be targeted exclusively at APA (and any other publishers that get the same bright idea), forcing them to withdraw it, while leaving the dysfunctional NIH constraint on locus of deposit in place.
(3) I hope, instead, that the OA community will have the insight to target NIH's constraint on deposit locus as well, so as to persuade NIH to optimize its widely-imitated policy in the interests of its broader implications for the prospects of global OA --
one small step for NIH but a giant leap for mankind -- by fixing the one small bug in an otherwise brilliant policy.
Peter Suber: "Even if the NIH preference for PMC were a choice the agency could reverse at will, the APA deposit fee is another choice, not necessitated by the NIH policy and not justified by it."
Where there's a will, there's a way, and here it's an extremely simple way, a mere implementational detail:
Instead of depositing directly in PMC, authors deposit in their IRs and send PMC the URL. If NIH adopted that, the APA's PMC deposit surcharge bid would instantly become moot.
If the furor evoked by the APA $2500 surcharge proved to be the factor that managed to inspire NIH to take the rational step that rational argument alone has so far been powerless to inspire, then that will be a second (unintentional) green feather in APA's cap, and another of the ironies and absurdities of our
long, somnambulistic trek toward the optimal and inevitable outcome for scientific and scholarly research.
A Simple Way to Optimize the NIH Public Access Policy (Oct 2004)
Please Don't Copy-Cat Clone NIH-12 Non-OA Policy! (Jan 2005)
National Institutes of Health: Report on the NIH Public Access Policy. In: Department of Health and Human Services (Jan 2006, reporting 3.8% compliance rate after 8 months for its first, non-mandatory deposit policy)
Central versus institutional self-archiving (Sep 2006)
Optimizing OA Self-Archiving Mandates: What? Where? When? Why? How? (Sep 2006)
THE FEEDER AND THE DRIVER: Deposit Institutionally, Harvest Centrally (Jan 2008)
Optimize the NIH Mandate Now: Deposit Institutionally, Harvest Centrally (Jan 2008)
Yet Another Reason for Institutional OA Mandates: To Reinforce and Monitor Compliance With Funder OA Mandates (Feb 2008)
How To Integrate University and Funder Open Access Mandates (Mar 2008)
One Small Step for NIH, One Giant Leap for Mankind (Mar 2008)
NIH Invites Recommendations on How to Implement and Monitor Compliance with Its OA Self-Archiving Mandate (Apr 2008)
Institutional Repositories vs Subject/Central Repositories (Jun 2008)
Stevan Harnad
American Scientist Open Access Forum