[Update: See new definition of "Weak" and "Strong" OA, 29/4/2008]
The arch-analyst of apertivity, Richard Poynder, has published yet another excellent
interview, this time time with Peter Murray-Rust, a dedicated advocate of Open Data (OD).
Here are a few comments on some important differences between Open Access (OA) and Open Data (OD).
The explicit, primary target content of OA is the full-texts of all the articles published in the world's
25,000 peer-reviewed scholarly and scientific journals. This is a special case, among all texts, partly because (i) research depends critically on access to those journal articles, because (ii) journals are expensive, because (iii) authors don't seek or get revenue from the sale of their articles, and hence have always given them away to any would-be user, and because (iv) lost access means lost research impact.
Research data are also critical to research progress, of course, but the universal practice of publishing research findings in refereed journal articles has not extended to the publication of the raw data on which the articles are based. There have been two main reasons for this. One was the capacity of the paper medium: There was no affordable way that data could be published alongside articles in paper journals. The other was that not all authors wanted to publish their data, or at least not right away: They wanted the chance to fully data-mine the data they had themselves gathered, before making it available for data-mining by other researchers.
The online era has now made it possible to publish all data affordably online. That removed the first barrier (although there are still technical problems, which Peter Murray-Rust and others discuss and are working to overcome). But the question of whether and when an author makes his data open is still a matter for the author to decide. Perhaps it ought not to be the author's choice -- but that is a much bigger and more complicated question than OA (for in OA all authors already want to make their published articles freely accessible online).
That difference in scope and universality is one of the reasons the OA and OD movements are distinct ones: OD has both technical and political problems that OA does not have, and it is important that OA should not be slowed down by inheriting these extraneous problems -- just as it is important that OD should not be weighed down by the publisher copyright problems of OA (which do not apply to OD for the simple reason that the authors do not publish their data, hence do not transfer copyright to a publisher).
So far, this is all simple and transparent: OA and OD have different target contents, with different problems to contend with. OA's solution has been for researchers' institutions and funders to
mandate the self-archiving of all of OA's target content, making it free for all online. But an interesting overlap region is thereby created between OA and OD: for
article texts are themselves data! And one of the most important purposes for which the OD movement has sought to make data freely available online -- apart from the purpose of making it available for collaboration and use by all researchers -- is data-mining, by individuals as well as by software, and for re-publication in further 3rd-party online databases. Data-mining can be done not only on raw research data, but on article texts too, treating them as data: text-mining.
Here too, the interests of OA and OD are perfectly compatible and complementary -- except for one thing: If text-minability and 3rd-party re-publication were indeed to be made part of the definition of OA (i.e., not just removing price barriers to access by making research free for all online, but also "removing permissions barriers" by renegotiating copyright) then this would at the same time radically raise the barriers to achieving OA itself (just as insisting on making the paper edition free would), making it contingent on authors' willingness and success in renegotiating copyright with their publishers.
The online medium itself had been the critical new factor that had made it possible to remove price barriers to access, by making research articles toll-free online. But the price for going on to insist on the removal of both price barriers and "permissions barriers" jointly, as part of the very definition of OA, would have been to raise the problem of overcoming permissions barriers as a barrier to overcoming price barriers! For the new online medium that made toll-free online access possible, did not, in and of itself, redefine copyright, any more than it redefined ownership of the paper edition.
Toll-free online access (OA) will lead to copyright reform (and publishing reform, and perhaps eventually also to the demise of the paper edition). But the online medium alone, in and of itself, simply made toll-free online access possible -- and that is hence the proper definition of OA. (After all, copyright retention by authors was perfectly possible in the paper era. In and of itself, it is not an online matter at all -- although the online medium, and OA itself, will eventually lead to it.)
Peter Murray-Rust is right that there was some naivete about some of this at the time of the drafting of the
BOAI definition of OA (which I signed, even though I later opted for an
updated definition of OA, one that resolved this ambiguity in favor of immediate OA and its capacity to grow). More than naivete, there was ignorance and lack of foresight, both about the technical possibilities and about the practical obstacles. It was the online medium that had made OA possible: Toll-free access for all users had not been possible or even thinkable in the paper era, either to articles or to data, for both economic and practical reasons. But with the advent of the online era, toll-free access online became thinkable, and possible. Indeed it was already within reach: The only thing authors had to do was to make their articles and data accessible free for all, online.
But most article authors did not make their articles freely accessible online -- even though they all, without exception, sought no income from them their sale, wanting them only to be used, applied, cited and built upon. Most authors remained
paralyzed because (1) they were worried about
copyright and because (2) they didn't know how to provide OA, imagining that it might require a lot of
time and effort.
The solution was Green OA self-archiving mandates on the part of their universities and funders, as an extension of their already existing publish-or-perish mandate. In particular, the
IDOA (Immediate-Deposit/Optional-Access) Mandate requires researchers to deposit their articles in their Institutional Repositories (IRs) immediately upon publication (with access temporarily set to Closed Access for those journals that impose an access embargo period).
The IDOA solution works for OA -- it provides immediate OA for all the articles that are published in the 62% of journals that already endorse immediate OA. And for the 38% that do not, the articles are deposited as Closed Access; the IR's semi-automatic
"email eprint request" button then provides users with almost-immediate, almost-OA during any embargo period.
But this solution does not work for OD, because (a) depositing data cannot be mandated, it can only be encouraged and because (b) making article-texts re-usable by 3rd-party text-miners and re-publishers as data requires permission from the copyright holder. That is not part of IDOA, and the "email eprint request" button does not cover it either.
So the strategic issue is whether to insist on something stronger than IDOA -- at the risk of not reaching consensus on any mandate at all -- or waiting patiently a little while longer, to allow IDOA mandates to become universal, generating toll-free online access (OA), with its immediate resultant benefits to research and researchers -- and to trust that the pressure exerted by those very benefits will lead to the demise of embargoes as well as to OD (for both data and texts) in due course.
I would accordingly urge
patience on the part of the OD community, as well as to the Gold OA (publishing) and copyright-reform communities (even though I am by no means patient by nature myself!). Their day will come soon too!
But first, please allow Green OA to take the natural course that is now wide open for it, paving the way with universal IDOA mandates generating toll-free online access to research, and all its immediate benefits. The strategic course to take now is to allow those mandates to propagate globally. This is not the time for over-reaching, raising the ante for OA higher than what the mandates can provide, and thereby only jeopardizing their chances of being adopted in the first place.
Brody, T., Carr, L., Gingras, Y., Hajjem, C., Harnad, S. and Swan, A. (2007) Incentivizing the Open Access Research Web: Publication-Archiving, Data-Archiving and Scientometrics. CTWatch Quarterly 3(3).
Stevan Harnad
American Scientist Open Access Forum