[Update: See new definition of "Weak" and "Strong" OA, 29/4/2008]
SUMMARY: Peter Murray-Rust is anxious to ensure that all research data should be harvestable and data-mineable, by man and machine alike. He worries that authors might instead agree to transfer copyright to their publishers for their data (as many already transfer it for their article texts) in exchange for the publisher's green light to self-archive. Not to worry. If authors don't self-archive their data at all today, when they hold all the rights, nor do 85% of them self-archive their articles (not even the 62% for which they already have their publisher's green light), then why on earth would they transfer copyright for their data in exchange for a green light to self-archive both? So first things first: Focus on ensuring OA for all article texts (postprints) by first mandating immediate deposit (in the author's Institutional Repository) of all postprints as soon as they are accepted for publication (without necessarily insisting that access to those deposits be immediately set to OA). All else will follow from that simple step, as surely as day follows night. OA is just a matter of keystrokes.
Peter Murray-Rust (P-MR) writes: "I don’t disagree... [with] Stevan’s analysis of how we should deposit papers... I’m just more interested in data at present...
"Imagine, for example, that a publisher says 'I will make all our journals green as long as we retain copyright. And we’ll extend the paper to cover the whole of the scientific record'. That would be wonderful for Stevan and a complete disaster for paper-crunchers."
Make no mistake about it: Peter Murray-Rust (and Peter Suber) and I are all in total agreement about the goals, and in near-total agreement about the means.
PMR is especially concerned about research data harvesting and mining, which is not, strictly speaking, an OA matter, for two reasons:
(1) OA's primary target is research article texts. (That doesn't matter: free online access to
data is extremely important too, and is part of OA's wider target.)
(2) More important, access to article texts is actually -- or, as I suspect, perceptually -- constrained by publishers' copyright-based restrictions.
That is not true of data.
So, to a first approximation, authors are perfectly free to make their data OA today if they wish; all they need do is adopt the right
Creative Commons License for it and then
self-archive it in their
Institutional repository (IR). If they don't make their data OA, it's their own fault, not the fault of publisher restrictions, actual or perceived.
PMR is worried that authors, instead of self-archiving their data, will instead transfer copyright for their data to their publishers, in exchange for their publishers adopting a
Green policy. But I think PMR is misunderstanding a Green publisher policy here! Green publishers don't make their published matter OA; they merely bless the
author's making it OA, if he wishes, by self-archiving it. The only publishers that make their own published matter OA are
Gold OA publishers.
So what is the motivation for the copyright scenario PMR is worried about? Authors, who today cannot be bothered to self-archive their own data at all, and cannot be bothered to self-archive their articles either (and/or are too bothered by actual or perceived publisher's restrictions to do so) will henceforth, according to this scenario, adopt the brand-new practice of transferring copyright for their data (along with their articles) -- in exchange for their publishers going Green!
But why on earth would authors do that? What is the motivation? They can't be bothered self-archiving their data today, when they don't
need their publisher's blessing (or greenery) to do it, just as most of them can't be bothered to self-archive their articles, even when they have their Green publishers' (
62%) blessing to do so. Yet, for some unknown reason, these passive authors are to be imagined (in PMR's scenario) as being ready to transfer copyright for their undeposited data to their publishers, in exchange for their publishers' agreeing to give them the green light to self-archive their data (and articles)!
I think this fantasized scenario misses the point completely, and that point is precisely the one that PMR confesses he is less interested in, namely, that what is needed to get these passive authors to do the right thing -- in their own interests, but also in the interests of their institutions, their funders, the public that funds their funders and in whose interests the research is done, and in the interests of research progress and productivity itself -- is a Green OA self-archiving mandate, adopted by their institutions and funders! A mandate that
requires them to self-archive, as a condition of employment and funding.
I would be quite happy if that self-archiving mandate applied to their data as well as to their articles. But first things first. A mandate first needs to be successfully adopted. And authors are already publishing their articles, but not yet publishing their data. Some may not wish to publish their data (preferring to keep it under wraps so that they, and not their competitors, can mine it); I make no judgment about this, except that co-bundling an article-archiving mandate with a data-archiving mandate would put the successful adoption of any mandate at all at risk, because of these potential exceptions and oppositions. (It is for similar reasons that a mandate to self-archive the refereed, accespted, published
postprint is unproblematic, whereas a mandate to also self-archive the
unrefereed preprint would be: Not all authors are willing to make their preprints public, nor should they be required to be. But all authors publish their postprints, by definition.)
So the prospects for the successful adoption of a postprint mandate are far better than the prospects for the successful adoption of a either postprint+preprint mandate or a postprint+data mandate. The
Immediate-Deposit/Optional-Access (ID/OA) mandate in particular, as repeatedly noted, is the one with the best chance of successful adoption: It moots publisher restrictions, because it only requires deposit, not immediate OA-setting; yet it has the "
Fair Use Button" to tide over usage needs during any embargo period. And ID/OA is not weighed down by requiring either preprint-deposit or data-deposit (or
copyright-retention): It merely
recommends them, just as it merely recommends setting access to the deposit as OA rather than Closed Access.
But -- if we agree that the only thing standing between us and 100% OA (not only for articles, but for data too) is those deposit
keystrokes that sluggish, passive authors simply are not doing, unmandated -- then it should also be apparent why ID/OA is exactly what is needed now to get those keystrokes done. ID/OA does not go the whole way: It does not require the Nth (OA) keystroke. But unless we are all deeply deluded about the
benefits of OA, OA's own rewards will see to it that those Nth keys get stroked, once the ID/OA mandate has propagated across all of research space, and human nature takes its course. The OA
usage/impact advantage, which today can only be demonstrated by painstaking, post-hoc analyses (invariably discounted by the publishing lobby's "
Dream Team," committed to arguing that there is no real advantage to OA!), will instead be obvious from the
download and
citation statistics for Open Access versus Closed Access articles in every
Institutional Repository (IR); and the difference will be reinforced by the deluge of email eprint requests generated by the IR software's "Fair Use Button."
But once those Nth keystrokes fall, the token will (by the same token!) also fall for those same authors (i.e., all authors!), realizing the potential benefits of depositing their data too. OA will naturally propagate from postprints to (many) preprints and (most) underlying data too.
That is why I urge patience, and making common cause with
Green OA mandates, for those whose goal is OA data-archiving: that too will come with the territory.
And there is no way in the world that authors will instead opt, for no reason at all, to transfer copyright to their publishers for their data too, along with copyright for their texts, in exchange for their publishers giving them the green light to do the self-archiving that they are not bothering to do anyway, with or without a green light!
They might agree to transfer data rights to a
Gold OA publisher. But that would be no problem, because Gold OA publishers really do make their articles (and hence also their data) accessible online in every way, including for robot harvesting and data-mining. With ID/OA mandates, the next step after 100% postprint deposits (62% OA and 38% Closed Access + semi-automatic Fair-Use eprints) will be the transition to 100% Green OA for all postprints (the Nth keystroke), and then to the depositing of the accompanying data, with rights specified by the CC license the author adopts.
That's the natural scenario, and all it needs right now is worldwide propagation of the ID/OA mandate. To achieve that, we must not chafe, for the time being, at the absence of a guarantee of robotic harvesting and mining (for either text or data), because insisting on that now can only blunt the motivation and slow the momentum for the universal adoption of the ID/OA mandate.
Let us be patient, get the mandates adopted, and let them do their inexorable work; then the era of 100% OA -- for both text and data -- will not be far behind. You can (data-)bank on that!
Stevan Harnad
American Scientist Open Access Forum