Re: Meeting: National Policies on Open Access Provision for University Research Output from Peter Murray-Rust on 2004-03-04 (American-Scientist-Open-Access-Forum)

From: Peter Murray-Rust <pm286_at_cam.ac.uk>
Date: Thu, 4 Mar 2004 20:37:44 +0000

I've been away for a bit and had time to think about the meeting. Here are
some thoughts - please feel free to redistribute them further if it helps.

Stevan Harnad wrote:

> It is true that open access to data and open access to articles is not
> the same thing, though there are links. Right now, the convention is
> for article authors to give their articles to publishers for free, and
> for publishers to charge for access. It is also the convention *not*
> to publish one's own data (just the analysis and results in the article).

I only had 2 minutes to talk so was unable to make some points clear.

** In this mail I am NOT talking about data collections however published.
I am restricting myself to data ("facts") which occur **in the body of
the final published manuscript** Though I have a wider agenda, in this
mail I am sticking precisely to the peer-reviewed primary literature.

In some disciplines data are published separately from the manuscript. In
others (chemistry, biosciences, ...) the data are often only ever published
in the primary publication (I call this micropublication of data). Typical
phrases are:
MeltingPoint 123 degC
Boiling Point (1 atm) 234 degC
Yield of reaction: 77%
etc.

These data are of great value to the community and are *re-used*. (Not
synonymous with republication). They may be aggregated, compared, input
into programs, used to create predictive models, etc. Facts have been
abstracted from the literature for 150 years and are IMO covered by the
Berne convention - they are copyright free. If I want to copy and publish
all melting points in the literature I can. However in many disciplines
there is a large and inefficient secondary publishing industry.

It's important to stress that there is a critical need for
machine-readability of articles. This gives vast improvements in indexing,
recovery, aggregation, etc.

However when the facts are in eForm including in the 95% green form
http://www.lboro.ac.uk/departments/ls/disresearch/romeo/Romeo%20Publisher%20Policies.
it is highly questionable whether they can be re-used on a significant
scale due to the European directive on copyright. Both data within
an article (e.g. a table) and aggregated in journals can be called a
database and hence copyright of publisher. There is a great need
to ensure that the data in articles have a level of access compatible
with the author's intentions. Otherwise the OA movement might even make
things worse (by implying that copyright was an unimportant issue).

> Both need to change. Articles need to be self-archived and data need to
> be self-archived. The difference is that articles are right now only
> being made available by the publisher (through tolls), whereas the data
> are not being made available at all -- *except* if they are not the
> autor's data, but proprietary data of some sort, compiled by the vendor
> (e.g., ISI).

I fully support self-archiving. I enjoyed the presentations and got a lot
from them. The adoption of green or gold will remove one fundamental
barrier to access to data, but not all

> Now the solution for self-archiving of one's own articles is to
> self-archive them, period. Absolutely no need to get or seek
> re-publication rights or any other change in copyright. The same
> is true of one's own data. Just self-archive it.

This won't solve our problem. Indeed it is almost more frustrating. We can
now see the data but we can't re-use it (safely).

> But for the data of *others* (e.g., ISI's citation data), one may *not*
> access it without paying a toll, and one certainly may not re-publish
> it.

This isn't relevant. If ISI has created its own information it is allowed
to copyright it. I wouldn't dream of republishing it. However I might wish
to re-use parts of it. ISI might reasonably object. It would come down to
fair use. BUT I expect that almost all scientists (most outside the OA
movement) would not wish to legally forbid reuse of their data.

> As to *my* articles, and *my* data: *I* may republish them (with another
> publisher) only with permission from the original publisher (permission
> usually is granted) -- but why would I bother republishing, when they
> are already open-access to all because I have self-archived them?
> Someone *else* may not republish them either, but why bother, when all
> they need do is insert the OA URL wherever in their publication they
> want the user to read the text of my article (i.e., where they would
> ordinatrily insert the text)?
>
> As to my data: If I self-archive it, anypne can read, download,
> process, analyse it, and report the results.

Not if the copyright does not grant the right.

> To republish my data in a compliation of theirs, they need my
> persmission (which I give them, of course).

Why?

If I publish data no-one should need my permission to reuse it.

> But if the user is not coming to me, for *my* data, but, say, to ISI,
> for their proprietary data, they may not republish them without
> permission (and they can access and analyze them only for a fee).

Steve Hitchcock wrote:

> > Peter, it seemed to me, was talking about real problems with new
> > legislation on database copyright protection. This is not
> > the same as the OA author copyright issue, and the point that Stevan was
> > making that emerges from the spurious free access vs open access
> > distinction that Stevan has been busy refuting on his Am Sci forum. My
> > guess is that Peter doesn't follow the forum and therefore didn't get
> > Stevan's point, as Stevan didn't get his.

I agree that I have probably misunderstood some of the terminology in the
OA area. The meeting helped clarify it

Stevan Harnad wrote:

> Peter did not get my point, but I certainly did get his:

I think the score is about 0.5 each and we need to increase that. :-)

> There is no reason at all to burden the completely feasible agendas of
> self-archiving one's own published articles, and one's own data, with
> the very different agenda of changing the laws or permissions for the
> re-use of someone *else*'s proprietary data.

Scientific data are *not* proprietary!

> We can be sympathetic to such efforts, but we must make it clear that that
> is a completely (yes, completely) different matter, not to be confused
> with OA and self-archiving of one's own articles and data. Otherwise,
> one simply slows down, yet again, and again needlessly, the provision
> of open-access to one's own articles and data, with gratuitous extra
> handicaps that do not belong there. This, when the self-archiving of
> one's own artiucles and data is already so long, long, overdue!

I appreciate that the OA movement has to draw some boundaries. Please
accept that these are not clear on a cursory inspection and although I am
learning others will also misunderstand. If the boundaries are too rigid
and hostile, then neighbouring efforts may be disadvantaged.

Peter Murray-Rust wrote:

> > > It is clear that data is a poor relation in the Open Access
> > > movement and that copyright issues are being given low priority
> > > as opposed to human reading. I therefore came away with mixed
> > > feelings - glad that the Open Access movement seems to have
> > > got momentum but sad that many of the issues that matter to
> > > us are deliberately omitted.
>
> Very, very deliberately omitted for the reasons adduced above:
> article/data OA is already 100% feasible and 1000% overdue (and 1000%
> under-understood).

The very acronym OA is misunderstood on first glance.

> There are many other worthy causes, including world hunger. But the
> 100% feasibility of article/data OA through self-archiving should
> not be further hamstrung with those other problems, and their own
> particular obstacles, which are *not* the obstacles of article/data
> self-archiving. (Article/data self-archiving in reality has no real
> obstacles whatsoever, just imagined and self-imposed ones: linking their
> fate to changing database legislation for data other than one's own is
> one of those unecessary, self-imposed obstacles!)

If my concerns are imagined, then I would be grateful for evidence that I
can be bolder.

Steve Hitchcock wrote:

> > I really enjoyed the talk from Jeremy and Mike too, although
> > speaking to delegates after the meeting I was surprised they didn't
> > get the main point, that all data will be online and accessible,
> > not just some of it.

online - but not fully accessible

Stevan Harnad wrote:

> There is indeed a vital connection with escience -- in the
> self-archiving of one's *own* data. The data of others is an entirely
> different matter (except inasmuch as they too are, as one hopes,
> self-archiving their *own* data!

My proposal would be:

(a) that the OA movement could take a constructive look at our concerns
and not dismiss it summarily

(b) that they/we explore enhancements to the OA procedures that help
the accessibility of data. In the first instance I would be happy with
something like a statement: "the data in this manuscript are provided
copyright free and available for re-use without the author's or publishers
explicit permission, as long as provenance and moral rights are honoured".
links to ICSU statements could be made. IANAL but this seems a feasible
start and would make the position clear.

Peter Murray-Rust
Unilever Centre for Molecular Informatics
Chemistry Department, Cambridge University
Lensfield Road, CAMBRIDGE, CB2 1EW, UK
Tel: +44-1223-763069
Received on Thu Mar 04 2004 - 20:37:44 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:47:22 GMT