Open Access Archivangelism

Thursday, March 30. 2006

Formaldehyde and Function

On Thu, 30 Mar 2006, Helen Hockx-Yu wrote in JISC-REPOSITORIES:

"I should be grateful if anyone can provide me some evidence to back the following statement:
'Concern of longevity has contributed to the lack of active engagement from many researchers [with institutional repositories]. Guarantee of long-term preservation helps enhance a repository's trustworthiness by giving authors confidence in the future accessibility and more incentives to deposit content'
"I guess longevity here also applies to the financial sustainability of the repository itself as a business operation, in addition to its content."

The statement is (1) not based on evidence at all, but pure speculation and (2) speculation not on the part of the content-providers (i.e., the authors, who are presently only spontaneously self-archiving their published articles at about the 15% level) but on the part of others, whose a priori concept of an institutional repository is that it is for long-term preservation (rather than for immediate access-provision and impact maximisation)

One pretty much gets out of such subjective speculations what one puts into them (including the requisite confirmatory moans from fellow-preservationists!).

JISC author surveys have given the empirical answer as to why only about 15% of papers are being self-archived spontaneously today (although 49% of authors have deposited at least once): Authors are too busy to do it until/unless their employers and or funders make it a priority by mandating it -- and then 95% of them will duly do it:

Swan, A. (2005) Open access self-archiving: An Introduction. JISC/ Key Perspectives Technical Report.

But it would be absolutely absurd of their employers and funders to mandate self-archiving for the sake of long-term preservation! Preservation of what, and why? Articles are published by journals. The preservation of the published version (PDF/XML) is the responsibility of the journals that publish it, the libraries that subscribe/license it, and the deposit libraries that archive it. None of that is the responsibility of the author or his institution, and never has been. Hence it is ridiculous to think the reason authors are not self-archiving today is because they are fretting about preservation!

Nor is there the slightest evidence that the 15% of articles that has been self-archived spontaneously in central or institutional repositories has vanished or is at risk! Arxiv content is still there today, a decade and a half since its inception in 1991, under nonstop use. CogPrints contents likewise, since its inception nearly a decade ago. Ditto for the IRs that have been up since GNU Eprints was first released in 2000.

The pertinent feature of all of these archives (even the oldest and biggest) is the pathetically small proportion of their total annual target content -- for Arxiv, all of physics+, for CogPrints, all of cognitive science, for PubMed Central, all of biomedical science, and for institutional IRs, all of each institution's own annual research article output -- what a pathetic proportion of their respective target contents they are actually capturing.

But there are exceptions, and the biggest of them is CERN, which is far above the spontaneous 15% self-archiving baseline and rapidly approaching 100% for its current annual output (while making remarkable progress with its retroactive legacy output too, thanks to superb library activism).

So too are Southampton ECS, U. Minho, and QUT. And the reason is that these four institutions (3 institutions plus 1 institutional department) self-archiving mandates for their own output (rather than no policy, or library activism alone). And the rationale for the mandates (although of course these archives, like all IRs, are duly attending to the preservation of what contents they have!) is not long-term preservation but immediate access-provision for the sake of maximising usage and impact before their authors' bones are in preservation.

So while preservationists lose themselves in speculation about the fact that maybe authors are not depositing because their secret yearnings for preservation are even more exacting than the preservationists', so they are abstaining until they can be absolutely guranteed of immortality for their texts as well as their institutions, the reality is much simpler:

They have (and should have) no special interest in preservation for their authors' drafts. They do have an interest in citation, but not enough to bother self-archiving until/unless their institutions and funders require it. Silly, and short-sighted (sic) but there we are.

Let us hope that their institutions and funders will have the good sense to adopt policies that require (and reward) their researchers for doing what is in their own best interests (as well as the best interests of their institutions and funders) -- just as they already require and reward them to publish (or perish).

Nor is the reward the imperishability of those authors' refereed final drafts that they will be self-archiving (not the publisher's proprietary PDF), but their own scientific immortality (which would slip away fast if they were to keep waiting to immortalise their publishers' PDFs instead, as the preservationists -- embalmers? -- are imagining they are doing).

(Do I sound like an archivangelist whose remaining reserves of patients have taken flight?)

Stevan Harnad

Posted by Stevan Harnad in Self-Archiving Mandates at 22:34 | Comments (0) | Trackbacks (0)

Manual Evaluation of Robot Performance in Identifying Open Access Articles

Update Jan 1, 2010: See Gargouri, Y; C Hajjem, V Larivière, Y Gingras, L Carr,T Brody & S Harnad (2010) “Open Access, Whether Self-Selected or Mandated, Increases Citation Impact, Especially for Higher Quality Research”
Update Feb 8, 2010: See also "Open Access: Self-Selected, Mandated & Random; Answers & Questions"

Summary: Antelman et al. (2005) hand-tested the accuracy of the algorithm that Hajjem et al.'s (2005) software robot used to to trawl the web and automatically identify Open Access (OA) and Non-Open-Access (NOA) articles (references derived from the ISI database). Antelman et al. found much lower accuracy than Hajjem et al. Had reported. Hajjem et al. have now re-done the hand-testing on a larger sample (1000) in Biology, and demonstrated that Hajjem et al.'s original estimate of the robot's accuracy was much closer to the correct one. The discrepancy was because both Antelman et al. And Hajjem et al had hand-checked a sample other than the one the robot was sampling. Our present sample, identical with what the robot saw, yielded: d' 2.62, bias 0.68, true OA 93%, false OA 12%. We also checked whether the OA citation advantage (the ratio of the average citation counts for OA articles to the average citation counts for NOA articles in the same journal/issue) was an artifact of false OA: The robot-based OA citation Advantage of OA over NOA for this sample [(OA-NOA)/NOA x 100] was 70%. We partitioned this into the ratio of the citation counts for true (93%) OA articles to the NOA articles versus the ratio of the citation counts for the false (12%) "OA" articles. The "false OA" advantage for this 12% of the articles was 33%, so there is definitely a false OA Advantage bias component in our results. However, the true OA advantage, for 93% of the articles, was 77%. So in fact, we are underestimating the true OA advantage.
Previous AmSci Topic Thread:
"Manual Evaluation of Algorithm Performance on Identifying OA" (Dec 2005)

References cited:

Antelman, K., Bakkalbasi, N., Goodman, D., Hajjem, C. and Harnad, S. (2005) Evaluation of Algorithm Performance on Identifying OA. Technical Report, North Carolina State University Libraries, North Carolina State University.

Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4) pp. 39-47.

In an unpublished study, Antelman et al. (2005) hand-tested the accuracy of the algorithm that Hajjem et al.'s (2005) software robot used to identify Open Access (OA) and Non-Open-Access (NOA) articles in the ISI database. Antelman et al. found much lower accuracy (d' 0.98, bias 0.78, true OA 77%, false OA 41%), with their larger sample of nearly 600 (half OA, half NOA) in Biology (and even lower, near-chance performance in Sociology, sample size 600, d' 0.11, bias 0.99, true OA 53% false OA 49%) compared to Hajjem et al., who had with their smaller Biology sample of 200, found: d' 2.45, beta 0.52, true OA 93%, false OA 16%.

Hajjem et al. have now re-done the hand-testing on a still larger sample (1000) in Biology, and we think we have identified the reason for the discrepancy, and demonstrated that Hajjem et al.'s original estimate of the robot's accuracy was closer to the correct one.

The discrepancy was because Antelman et al. were hand-checking a sample other than the one the robot was sampling: The templates are the ISI articles. The ISI bibliographic data (author, title, etc.) for each article is first used to automatically trawl the web with search engines looking for hits, and then the robot applies its algorithm to the first 60 hits, calling the article "OA" if the algorithm thinks it has found at least one OA full-text among the 60 hits sampled, and NOA if it does not find one.

Antelman et al. did not hand-check these same 60 hits for accuracy, because the hits themselves were not saved; the only thing recorded was the robot's verdict on whether a given article was OA or NOA. So Antelman et al. generated another sample -- with different search engines, on a different occasion -- for about 300 articles that the robot had previously identified as having an OA version in its sample, and 300 for which it had not found an OA version in its sample; Antelman et al.'s hand-testing found much lower accuracy.

Hajjem et al.'s first test of the robot's accuracy made the very same mistake of hand-checking a new sample instead of saving the hits, and perhaps it yielded higher accuracy only because the time difference between the two samples was much smaller (but the search engines were again not the same ones used). Both accuracy hand-tests were based on incommensurable samples.

Testing the robot's accuracy in this way is analogous to testing the accuracy of an instant blood test for the presence of a disease in a vast number of villages by testing a sample of 60 villagers in each (and declaring the disease to be present in the village (OA) if a positive case is detected in the sample of 60, NOA otherwise) and then testing the accuracy of the instant test against a reliable incubated test, but doing this by picking another sample of 60 from 100 of the villages that had previously been identified as "OA" based on the instant test and 100 that had been identified as "NOA." Clearly, to test the accuracy of the first, instant test, the second test ought to have been performed on the very same individuals on which the first test had been performed, not on another sample based only on the overall outcome of the first test, at the whole-village level.

So when we hand-checked the actual hits (URLs) that the robot had identified as "OA" or "NOA" in our Biology sample of 1000, saving all the hits this time, the robot's accuracy was again much higher: d' 2.62, bias 0.68, true OA 93%, false OA 12%.

All this merely concerned the robot's accuracy in detecting true OA. But our larger hand-checked sample now also allowed us to check whether the OA citation advantage (the ratio of the average citation counts for OA articles to the average citation counts for NOA articles in the same journal/issue) was an artifact of false OA:

We accordingly had the robot's estimate of the OA citation Advantage of OA over NOA for this sample [(OA-NOA)/NOA x 100 = 70%], and we could now partition this into the ratio of the citation counts for true (93%) OA articles to the NOA articles (false NOA was very low, and would have worked against an OA citation advantage) versus the ratio of the citation counts for the false (12%) "OA" articles. The "false OA" advantage for this 12% of the articles was 33%, so there is definitely a false OA Advantage bias component in our results. However, the true OA advantage, for 93% of the articles, was 77%. So in fact, we are underestimating the OA advantage.

As explained in previous postings on the American Scientist topic thread, the purpose of the robot studies is not to get the most accurate possible estimate of the current percentage of OA in each field we study, nor even to get the most accurate possible estimate of the size of the OA citation Advantage. The advantage of a robot over much more accurate hand-testing is that we can look at a much larger sample, and faster -- indeed, we can test all of the articles in all the journals in each field in the ISI database, across years. Our interest at this point is in nothing more accurate than a rank-ordering of %OA as well as %OA citation Advantage across fields and years. We will nevertheless tighten the algorithm a little; the trick is not to make the algorithm so exacting for OA as to make it start producing substantially more false NOA errors, thereby weakening its overall accuracy for %OA as well as %OA advantage.

Stevan Harnad & Chawki Hajjem

Posted by Stevan Harnad in Methodology at 17:57 | Comments (0) | Trackbacks (0)

Thursday, March 23. 2006

Online, Continuous, Metrics-Based Research Assessment

Future UK Research Assessment Exercise (RAE) to be Metrics-Based
As predicted, and long urged, the UK's wasteful, time-consuming Research Assessment Exercise (RAE) is to be replaced by metrics:

"Research exercise to be scrapped"
Donald MacLeod, Guardian Wednesday March 22, 2006

Science and innovation investment framework 2004-2014: next steps supporting UK Budget 2006
"The Government is strongly committed to the dual support system, and to rewarding research excellence, but recognises some of the burdens imposed by the existing Research Assessment Exercise (RAE). The Government's firm presumption is that after the 2008 RAE the system for assessing research quality and allocating 'quality-related' (QR) funding will be mainly metrics-based... The Government will launch a consultation on its preferred option for a metrics-based system for assessing research quality and allocating QR funding, publishing results in time for the 2006 Pre-Budget Report."

"Over recent years a number of studies have considered options for a radically different allocation system for QR in order to avoid or reduce the need for a peer review process. The focus in most cases has been on identifying one or more metrics that could be used to assess research quality and allocate funding, for example research income, citations, publications, research student numbers etc. The Government has considered the evidence to date and favours identifying a simpler system that may not precisely replicate the level of detailed analysis of the RAE but would enable an appropriate distribution of QR funding at the institutional level."

"[M]etrics collected as part of the next assessment will be used to undertake an exercise shadowing the 2008 RAE itself, to provide a benchmark on the information value of the metrics as compared to the outcomes of the full peer review process. The aim of any changes following this exercise will be to reduce the administrative burden of peer review, wherever possible, consistent with the overriding aim of assessing excellence"

RAE outcome is most closely correlated (r = 0.98) with the metric of prior RCUK research funding (Figure 4.1) (this is no doubt in part a "Matthew Effect"), but research citation impact is another metric highly correlated with the RAE outcome, even though it is not explicitly counted. Now it can be explicitly counted (along with other powerful new performance metrics) and all the rest of the ritualistic time-wasting can be abandoned, without further ceremony.

This represents a great boost for institutional self-archiving in Open Access Institutional Repositories, not only because that is the obvious, optimal means of submission to the new metric RAE, but because it is also a powerful means of maximising research impact, i.e., maximising those metrics: (I hope Research Councils UK (RCUK) is listening!).

Harnad, S. (2001) Why I think that research access, impact and assessment are linked. Times Higher Education Supplement 1487: p. 16.

Harnad, S. (2003) Why I believe that all UK research output should be online. Times Higher Education Supplement. Friday, June 6 2003.

Harnad, S., Carr, L., Brody, T. & Oppenheim, C. (2003) Mandated online RAE CVs Linked to University Eprint Archives: Improving the UK Research Assessment Exercise whilst making it cheaper and easier. Ariadne 35.

Beans and Bean Counters (2005)

On Thu, 23 Mar 2006, Adrian Smith (Leeds) wrote:

"See also Mark Maslin [UCL] in Nature:
Research skewed by stress on highest-impact journals

And this new metric RAE policy will help "unskew" it, by instead placing the weight on the individual author/article citation counts (and download counts, CiteRanks, authority counts, citation/download latency, citation/longevity, co-citation signature, and many, many new OA metrics waiting to be devised and validated, including full-text semantic-analysis and semantic-web-tag analyses too) rather than only, or primarily, on the blunter instrument (the journal impact factor).

This is not just about one number any more! The journal tag will still have some weight, but just one weight among many, in an OA scientometric multiple regression equation, customised for each discipline.

This is an occasion for rejoicing at progress, pluralism and openness, not digging up obsolescent concerns about over-reliance on the journal impact factor.

On Thu, 23 Mar 2006, Ian Sommerville wrote on the CPHC list:

"This is the wording from the budget document
'The Government wants this to continue, but thinks the close correlation between Research Council income and QR income may provide an opportunity for allocating QR using a radically simpler system. '
"The point is made that, at an institutional level, there is a 0.98 correlation between research income and QR. No mention of citation impact. An alternative metric may be proposed for the humanities."

The document actually says

"one or more metrics... could be used to assess research quality and allocate funding, for example research income, citations, publications, research student numbers etc."

You are quite right, though, that the default metric many have in mind is research income, but be patient! Now that the door has been opened to objective metrics (instead of amateurish in-house peer-re-review), this will spawn more and more candidates for enriching the metric equation. If RAE top-slicing wants to continue to be an independent funding source in the present "dual" funding system (RCUK/RAE), it will want to have some predictive metrics that are independent of prior funding. (If RAE instead just wants to redundantly echo research funding, it need merely scale up RCUK research grants to absorb what would have been the RAE top-slice and drop the RAE and dual funding altogether!)

The important thing is to scrap the useless, time-wasting RAE preparation/evaluation ritual we were all faithfully performing, when the outcome was already so predictable from other, cheaper, quantitative sources. Objective metrics are the natural, sensible way to conduct such an exercise, continuously, and once we are doing metrics, many powerful new predictive measures will emerge, over and above grant income and citations. The RAE ranking will not come from one variable, but from a multiple regression equation, with many weighted predictor metrics in an Open Access world, in which research full-texts in their own authors' Institutional Repositories are citation-linked, download-monitored and otherwise scientometrically assessed and analysed continuously.

Hitchcock, S., Brody, T., Gutteridge, C., Carr, L., Hall, W., Harnad, S., Bergmark, D. and Lagoze, C. (2002) Open Citation Linking: The Way Forward. D-Lib Magazine 8(10).

Brody, T., Harnad, S. and Carr, L. (2005) Earlier Web Usage Statistics as Predictors of Later Citation Impact. Journal of the American Association for Information Science and Technology (JASIST).

Citebase impact ranking engine (2001)
Usage/citation correlator/predictor

Stevan Harnad

Posted by Stevan Harnad in Research Assessment at 15:43 | Comments (0) | Trackback (1)

Wednesday, March 22. 2006

Optimizing MIT's Open Access Policy

MIT has proposed two OA policy steps: compliance with the NIH Public Access Policy and seeking consensus on copyright retention.

In the interests of brevity, clarity, and comprehension, I will be (uncharacteristically) brief (8 points):

(1) The two steps taken by MIT are a very good thing, compared to taking no steps at all, but:

(2) Step one, explicitly formalizing compliance with the current NIH Public Access Policy, is a retrograde step, as it helps entrench the flawed NIH policy in its present form (a request instead of a requirement, a 6-12-month delay instead of immediate deposit, and depositing in PubMed Central instead of in the fundee's own Institutional Repository).

(3) Step two, seeking university-wide agreement on copyright retention, is not a necessary prerequisite for OA self-archiving, and there will be time-consuming resistance to it, from both publishers and authors. Hence it is the wrong thing to target first at this time. (University of California is making exactly the same mistake.)

(4) What MIT should be doing is neither formalizing compliance with the current NIH policy nor giving priority to copyright retention (even though copyright retention is highly desirable).

(5) What MIT should be doing is to require that all MIT researchers deposit, in MIT's IR, the final refereed drafts of all their journal articles, immediately upon acceptance for publication.

(6) In addition, MIT should encourage that all MIT researchers set access to each deposited full text as OA immediately upon deposit -- but leaving them the option of choosing instead to set access as Restricted Access (MIT-only, or author-only) if they wish, and fulfilling email eprints requests generated by the OA metadata by emailing the eprint to the requester for the time being. (Ninety-three percent of journals already endorse immediate OA self-archiving, so only 7% will require the option of eprint emailing of RA deposits.)

(7) Having adopted this two-part deposit/access-setting policy as a mandate, MIT can then go on to its two proposed steps, of complying with the provisional NIH public access policy (by allowing harvesting at the appointed date from MIT's own IR into PubMed Central) and seeking an MIT consensus on copyright retention.

(8) Instead going straight to (7) without first adopting and implementing (5) and (6) is a huge (and unnecessary) strategic mistake, and a bad model for other institutions to follow.

Posted by Peter Suber in Open Access News:

Two steps to support OA at MIT

"At its March 15 faculty meeting, the MIT faculty discussed two OA-related topics: complying with the NIH public access policy and using an MIT amendment to modify standard publishing contracts and let authors retain key rights. Details in today's report from the MIT News Office:

"Concerned that taxpayer-funded research is not accessible to the general public because of the tightly controlled, proprietary system enforced by some journal publishers, the National Institutes of Health (NIH) is asking every NIH-funded scientist who publishes results in a peer-reviewed journal to deposit a digital copy of the article in PubMed Central (PMC), the online digital library maintained by the NIH. Not later than 12 months after the journal article appears, PMC will then provide free online access to the public.

"Director of Libraries Anne J. Wolpert and Vice President for Research Alice Gast discussed with the faculty MIT's response to this issue, which has been to support NIH researchers in complying with the policy, and also to enable any MIT researcher to use a more author-friendly copyright agreement when submitting articles for publication. "The overwhelming majority of work produced by you is licensed back to you, and you can't always use your own work in the way you want to use it," Wolpert told the faculty. Copyright exemptions that were carefully crafted to allow the academy to teach and do research are steadily being superseded by intellectual property regimes that were developed for the benefit of the entertainment industry. "What Disney wants, the academy gets, whether it suits your interests or not," Wolpert said.

"Among the reasons for universities to support open access is the high cost associated with renting access to journals, which for MIT alone has grown in the past decade from $2.6 million to more than $6 million a year....

"An amendment that can be attached to any publication's copyright agreement was disseminated to principal investigators in February. "We have to wait and see how this plays out and see what feedback we get from publishers," Gast said. The goal is for MIT as an institution to work out agreements with publishers rather than make individual researchers fight their own battles. More information, as well as the amendment, which would override the publisher's copyright agreement, is available online and other MIT web sites.

"There is a distinct feeling among our counterparts at large private and public institutions that if MIT takes a reasonable and principled position on this issue, other institutions will be encouraged to do likewise," Wolpert said.

PETER SUBER: "The MIT contract amendmentis closely related to the SPARC Author's Addendum drafted for the same purpose. The MIT amendment gives authors (among other things) the non-exclusive right to copy and distribute their own article, to make derivative works from it, and to deposit the final published version in an OA repository. MIT is the first university I know to present its faculty with a lawyer-drafted contract amendment for the purpose of retaining the rights needed to provide OA to their own work. Kudos to all involved. MIT faculty could change the default for faculty with less bargaining power."

Posted in OA News by Peter Suber at 3/21/2006 10:41:00 PM.

Stevan Harnad

Posted by Stevan Harnad in Self-Archiving Mandates at 17:00 | Comments (0) | Trackbacks (0)

Tuesday, March 21. 2006

Code vs. Content: Using/Revising Form vs. Using/Revising Findings

Richard Poynder has done another penetrating and informative interview -- this time of Richard Stallman, founder of the GNU Project, the Free Software Foundation, and Copyleft.

Richard Stallman is a remarkable person and has made and continues to make invaluable contributions to freeing software to be creatively developed and used without proprietary restrictions.

It is important to understand what it is that Stallman stands for, in order to see that it is not the same thing as Open Access (OA) (although of course it is fully compatible with and in harmony with OA):

What Stallman means by "free" is free to use, develop and distribute. His main target is software code (though he has a more general view about all forms of property). Stallman opposes anything that prevents software from being further developed, improved upon, and distributed. (N.B. He does not oppose the selling of software; he opposes the hiding of the code, and the outlawing of its re-use and revision.)

Please note, though, that he states very clearly in the Interview that he understands that scholarly/scientific articles are not like computer code, meant to be modified and redistributed by others. This is a profound and fundamental difference, and if you don't grasp it, you invite all kinds of confusion and misunderstanding:

The right analogy between research findings and software is at the level of the content of the research findings, not the form (i.e., not the code, not the text). The text is proprietary, but the content is for everyone's use, and re-use (with proper citation to the source). Software code, in contrast, has no content. It is the code itself that Stallman is talking about modifying and redistributing.

The one small point of commonality (as opposed to mere analogy, at the content level) is the question of mirroring rights for OA texts: Stallman thinks it is not enough to put OA content in one's own IR; he thinks you have to make sure to formally grant explicit mirroring (and, presumably, caching and harvesting) rights with it too.

I don't agree with Stallman on this one tiny point; I think all the rest of the uses pretty much come with the web/OA territory right now; I'll start worrying about it if/when google ever needs a license to harvest freely accessible web content. Right now, too much OA content is still missing, and worries about having to renegotiate rights are part of what keeps it missing. So let's forget about that for now.

The disanalogy between the OA movement and the Free Software movement is, of course, that whereas the publisher charging for access to the text is fine, the author also wants to provide toll-free access to his own final draft, in order to maximize its usage and impact: The authors of peer-reviewed journal articles are not interested in royalty revenue (whereas some authors of software code might be) because any toll-barrier at all preventing a would-be user from having access to their work costs the author in terms of lost research impact, research progress, and even further research grant income and other possible rewards.

I think this disanalogy is easy to understand, but it too needs to be made and kept quite explicit in everyone's mind.

I close with just a logical point on the question of "free" in the sense of free-of-charge and "free" in the GNU sense of free-to-revise/redistribute: Is it not a bug if a hacker (i.e., a programmer, in Stallman's good sense, the original meaning of "hacker") can write software code, sell it (in the hope of making an honest living), but the very first customer who buys it can make a trivial revision (or none at all) and then give the code away to one and all (or even make a tiny improvement, relative to the total work that went into the original) and start selling it at a competing cut-rate price?

I just pose this as a kind of koan for the putative free/free distinction (I'm sure others have thought of it too, and there may even be an answer, but I cannot intuit it offhand); and if the distinction does not survive it, then what has to go: the freedom to sell or the freedom to revise/redistribute?

I ask this only in a spirit of genuine puzzlement, because I really admire what Richard Stallman advocates and stands for.

One could also ask whether Richard Stallman's sense of "freedom" really scales up, beyond software, to all forms of human product, as he seems to believe. How many people could earn an honest living from their creative work that way?

Eprints of course has been GNU Eprints from the outset.

Richard Stallman AmSci Postings:

Re: Garfield: "Acknowledged Self-Archiving is Not Prior Publication"
(Tue Sep 10 2002 - 00:34:39 BST)

Re: PostGutenberg Copyrights and Wrongs for Give-Away Research
(Tue Jul 23 2002 - 05:01:11 BST)

Re: PostGutenberg Copyrights and Wrongs for Give-Away Research
(Sun Jul 21 2002 - 21:15:17 BST)

Re: Ingenta to offer OAI eprint service
(Sat Jul 20 2002 - 01:35:08 BST)

Re: Association for Computer Machinery Copyright/Self-Archiving Policy
(Tue Mar 19 2002 - 08:43:38 GMT)

Re: Copyright FAQ for refereed journal authors
(Fri Oct 15 1999 - 21:59:43 BST)

Re: Copyright FAQ for refereed journal authors
(Thu Oct 14 1999 - 16:44:57 BST)

(See also replies, and google search "amsci stallman")

Stevan Harnad

Posted by Stevan Harnad in Self-Archiving Mandates at 19:12 | Comments (0) | Trackbacks (0)

Monday, March 13. 2006

Proposed update of BOAI definition of OA: Immediate and Permanent

[Update: See new definition of "Gratis" and "Libre" OA, 27/8/2008]

Note to Peter Suber and the original formulators of the Budapest Open Access Initiative (Re-posted from AmSci Forum 13 March 2005 [last year]).

I would like to suggest that this is the right time, in light of recent developments, to update the BOAI definition of OA to make explicit what was already implicit in it: That OA must be now and must be permanent (not, for example, a feature that is provided for an instant, a century from now).

I think this was always perfectly obvious to anyone who read the BOAI definition of OA, but, as people will do, those with a vested interest in doing so found a loophole in the wording as it now stands. This is easily remediable by adding and announcing the obvious "immediate" (upon acceptance for publication) and "permanent" that should have been stated explicitly in the first place.

I think we overlooked this partly because we could not second-guess all conceivable self-serving construals by opponents of OA, but partly because we were trying to be as encouraging as possible about partial measures. Yet we were very careful, and should now be even moreso, not to allow the notion of "partial-OA" -- which is on a direct slippery-slope in which TA (toll-access) too would become construable as just another form of partial-OA!

Delayed free-access and temporary free-access are forms of access, to be sure -- and some is generally better than none, more is generally better than less -- but OA itself is only complete free access, immediate and permanent, for everyone and anyone, anytime, anywhere webwide. Otherwise all access would be OA, and the rest would just be a matter of degree (or, in the words of the wag, we would have agreed on our profession and we would now be merely haggling about the price!)

The BOAI definition was not etched in stone. 3+ years of experience have now suggested ways in which it can be clarified and optimized. This is a good time to make explicit what was already implicit in it, which is: OA is a trait of an article, not an evanescent state. Just as an article is OA if it is freely accessible online, an article is not OA if it is not freely accessible online, and hence an article that is not immediately accessible freely online is not OA and an article that is no longer freely accessible online is not OA (and never was -- within the limits of inductive uncertainty and the impossibility of clairvoyance, i.e., if the obsolescence was planned).

Being accessible might be a transitory state, but being OA has to be an all-or-none trait. Researchers don't need access to research eventually, or temporarily or sometimes or somewhere: All researchers need OA to all research, immediately, permanently, at all times, and everywhere (webwide). I suggest that we announce the following update to the passage that starts:

"By "open access" to this literature, we mean its free availability on the public internet, permitting..."

to:

"By "open access" to this literature, we mean its free availability on the public internet, immediately and permanantly, permitting..."

Those with an interest in blocking or minimzing non-toll-based access will of course scream that BOAI is "moving the goalposts!" but I think anyone who thinks clearly and honestly about the interests of the research community and of research itself, and what was the fundamental rationale and motivation for OA in the first place, will see that this is merely highlighting what the goal has been all along, not moving it.

Stevan Harnad

Date: Sun, 13 Mar 2005 03:30:27 +0000 (GMT)
From: Stevan Harnad
To: Richard Poynder
Subjectt: Poynder's Blog-Point

Hi Richard,

Re: http://poynder.blogspot.com/2005/03/what-is-open-access.html

One thing you missed: The "immediate" and "permanent" are and always were implicit in the BOAI definition of OA: An article is OA if and when it is freely accessible online. Obviously when it is not, it is not OA, so that excludes any embargo period, or any temporary "hook" period, withdrawn afterward!

The goal of OA is to make all articles OA: Not all articles OA after a while, or for a while. The answer to the question "Is this article OA?" has to be "yes", not "no". If an article can be OA some of the time, and not OA other times, then you may as well say an article can be OA to some people and not to other people (which is exactly what toll-access is: OA to those who can pay, non-OA to those who cannot).

Immediacy and permanence is as intrinsic to the fundamental rationale for OA as the full-text's being on-line and toll-free is. Researchers don't want to keep losing 6-12 months of research impact and progress, and call that Open Access.

Back Access is a cynical sop, any way you look at it, and a deplorable attempt to misuse both the principle of OA and the rationale underlying it.

I hope the Immediate Institutional Keystroke Policy as a default bottom line will put an end to any further inclination to try to use the Back-Access Ploy, for it immunizes institutions completely from any pressure for an embargo (the N-1 keystroles to deposit the metadata and full text are required, for internal purposes; the Nth OA keystroke is strongly encouraged but up to the author), leaving the dominoes to fall naturally (and anarchically) of their own accord. Sensible institutions won't even bother formalizing the Nth keystroke as optional, but will deal with it, if need be, on a case by case basis.

Stevan Harnad

Posted by Stevan Harnad in Self-Archiving Mandates at 23:22 | Comments (0) | Trackbacks (0)

Wellcome Trust and the 6-month embargo

The Wellcome Trust will have the eternal historical distinction of having been the first research funder to actually mandate Open Access (OA) self-archiving (May 2005):

"Comparing the Wellcome OA Policy and the RCUK (draft) Policy"

This represented a very important forward step for the planet's progress toward the optimal and inevitable target: 100% OA. The earlier well-intentioned but much-flawed -- and since failed -- NIH Public Access Policy alas did not help advance OA, but rather missed an opportunity and inadvertently held things back for at least 2 years. But the hope now is that -- inspired in part by the far better model provided by the Wellcome Trust policy -- the NIH policy will be revised, becoming a self-archiving requirement instead of just a self-archiving request, no longer allowing a delay of up to 12 (or even 6) months.

It does not follow, however, that the current Wellcome Trust policy is unflawed, or that it provides the optimal model for others to follow. It was a great help at its historic time, as a counterweight to the far more flawed NIH policy, but at this historic point, the Wellcome Trust policy too risks becoming a retardant instead of a facilitator of OA, if it is imitated by others in its flaws instead of its strengths.

The strength of the Wellcome Policy is that (1) it is an exception-free requirement, not an optional request, and that (2) it does not allow a delay of longer than 6 months.

Its flaw is that (a) it allows any delay at all and that (b) it requires self-archiving in a central, 3rd-party repository (PubMed Central; PMC) instead of the author's own institutional OA Institutional Repository (IR) (from which PMC could then harvest if/when it wishes).

The two flaws are linked. For the simple and natural way to rule out delays is to require immediate deposit of the accepted, final draft in the author's own institutional OA IR (immediately upon acceptance for publication), but merely request/encourage that access to the deposited draft should be immediately set to "Open Access." That leaves the author the option to provisionally set access instead as "Restricted Access" if need be (for up to 6 months).

How is this linked to the requirement to deposit in PMC instead of at home? Because PMC is neither the author nor the author's institution. It is not even the Wellcome Trust. It is a generic, 3rd-party repository, which publishers can (perhaps rightly) construe as a rival 3rd-party publisher. Publishers are certainly within their rights to block or embargo rival 3rd-party publishing. (Whether it makes any sense to try to treat a 3rd-party OA repository as a rival publisher in the OAI-interoperable age is another matter!)

But the author and the author's own institution certainly cannot be construed as a rival 3rd-party publisher: They are the party of the first part, the content-provider, and the publisher is only the party of the second part: the value-adder and vendor.

And that is why far more journals have given their green light to author self-archiving in their own respective institutional OA IRs, than to self-archiving in a central 3rd-party repository like PMC. And that is also why PMC-archiving is more vulnerable to a publisher embargo.

But there is an ultra-simple way to require immediate deposit while accommodating any publisher embargo at the same time: Require immediate deposit in the author's own OA IR -- immediately upon acceptance for publication -- and harvest the full-text into PMC after 6 months!

That way the deposit is, without exception, immediate, and for about 93% of articles, access too will be immediately OA. (Those articles, too, can be immediately harvested into PMC.)

For the c. 7% of articles set to Restricted Access, the metadata will be immediately visible anyway, and emailed eprint-requests (facilitated and automatized with the help of the IR software) can fulfil the access-needs of would-be users who cannot afford access to the proprietary journal version during the embargo period.

Why not implement the deposit/access-setting distinction, but in PMC rather than in the author's own IR? Because it fails to generalise to all the rest of OA research output (in all fields of research, not just biomedical). The Wellcome Trust funds some of the world's biomedical research; NIH funds more; but there are vasts amounts of further research -- in biology, medicine, physical sciences, engineering, social sciences and even the humanities -- that would all fail to benefit from a parochial PMC mandate for biomedical research. If, instead, funders like Wellcome and NIH mandated that their fundees self-archive in their own institutional OA IRs, that would effectively "tile" all of OA space, effectively and completely, as universities cover all fields of research output. (Central OA repositories like PMC and others would still be available for any orphan works from unaffiliated researchers.)

In other words, funders are not helping world OA if they keep thinking of it as a go-it-alone operation. Funders only fund bits; central OA repositories don't exist for all disciplines and fields; and even if they did, they -- unlike the researchers' institutions -- do not have the clout to reinforce scattered funder mandates with institutional self-archiving mandates, to ensure that all their institutional research output is indeed self-archived.

So the simple and sensible way to update and optimise the pioneering Wellcome Trust self-archiving mandate would be to (1) require the self-archiving to be done in the fundee's own institutional OA IR (as the UK Select Committee proposed), (2) require it to be done immediately upon acceptance for publication, (3) encourage immediate access-setting to OA, (4) require access-setting at OA by 6 months at the latest, and (5) harvest the metadata into PMC immediately upon deposit -- and the full-text into PMC (if need be -- there's a case to be made for just linking to the IR version) within 6 months at the latest.

Why is Wellcome Trust not making this simple and obvious update without even any need for prompting? I think it is because there are again green and gold wires crossed: Over and above its mission to ensure that all Wellcome-funded research (and, hopeably, all research) is made OA, the Wellcome Trust has the further worthy goal of encouraging a transition to the OA (gold) publishing model. This is all fine, but not if the slow, uncertain transition to gold OA is supported at the expense of a speedy, certain transition to 100% OA itself (green).

And that is what I think is happening: Wellcome is not doing everything it could to hasten OA itself, because it is not committed only to OA, but to publishing reform too.

My own view is that publishing reform will take care of itself, and that the urgent task is to get to 100% OA as soon as possible. (Indeed, that itself will probably prove the most important stimulant to publishing reform.) But to slow the immediately feasible and certain transition to OA in the service of far slower and less certain -- and more hypothetical -- measures to induce publishing reform, is not, I think, to help OA along the road to the optimal and inevitable (and already overdue) outcome.

Some comments:

On Mon, 13 Mar 2006, Robert Kiley (Wellcome Trust) [RK] wrote in the American Scientist Open Access Forum:

RK: "Please note the Wellcome Trust currently does NOT have any plans to reduce the 6 month time limit on its grant condition. The grant condition requires published research (original research papers in peer reviewed journals) arising in part or whole from Trust funding to be placed in Pubmed Central (or UK PMC when it exists) no later than 6 months after the date of publication."

No need to reduce the 6 months if Wellcome does not wish to. Just mandate immediate deposit (in the fundee's own OA IR) and let delayed access-setting bear the burden of the delay. Meanwhile, everyone gets into the habit of self-archiving at home, and emailing eprints can bridge the gap, universally and uniformly.

RK: "It is obvious that a potential delay of up to 6 months is not ideal in terms of the timing of access, but it is a realistic response to the very real concerns of publishers, large and small, that self archiving is a threat to their business model. Whether this is eventually shown to be the case is immaterial as it is this perception that we need to deal with."

Fine. As noted: Mandate immediate deposit and allow the option of delayed access-setting.

RK: " As the only funding organisation with a mandate in its grant condition to support open access through open access publishing and archiving in PMC we are very well aware how many journals are currently at odds with this policy."

Note the conflation of open access provision (through self-archiving, green) with open access publishing (gold)...

RK: "That is why, in conjunction with JISC, we are funding an extension of the Sherpa/Romeo project to identify, at the journal level, which journals will allow a copy of the published paper to be deposited into PMC/UKPMC so it is available no later than 6 months after the original publication date."

It is always good to extend Sherpa/Romeo's coverage, but Romeo already lists embargoes, if any. So surely what Romeo needs is more coverage of journal self-archiving policies, not a focus on 6-month embargoes!

RK: "In order to encourage experiments in alternative business models to the subscription model the Trust also explicitly supports open access publishing as part of the research funding process."

So far, so good. Funding authors' OA (gold) publishing charges is very constructive and helpful. But now this:

RK: " That is why we provided some assistance to OUP, Blackwell's and Springer in drafting the author licence for their various open access offerings so that they were explicitly compliant with publishing and depositing in an archive such as PMC."

This sort of thing simply encourages the locking in of a 6-month embargo instead of helping to phase it out!

If the Wellcome Trust instead simply mandated immediate deposit and let access-setting bear the weight of any embargoes, it would not need to get into the business of entrenching and canonizing embargoes instead of letting them die a quiet death of natural causes!

RK: " We see open access repositories and open access publishing as complimentary exercises and to us, and the publishers we talk to, there is a direct link between the impact of self archiving and the publishing process so it is a pragmatic response to deal with both issues in parallel."

What is complementary today is: (1) non-OA publishing, (2) OA publishing, and (3) OA repositories for the author self-archiving of both (1) and (2).

Self-archiving is not a form of OA publishing, and the immediate and reachable goal -- the one that justifies OA in the first place, namely, access to 100% of published research articles -- is a transition to 100% OA, not necessarily a transition to OA publishing.

RK: " In time the most likely scenario, and one the Trust is supporting, is that open access publishing, or another model yet to be invented, will become the norm and publishers will be able to operate without a reliance on subscriptions. As such the 6 month embargo period will be kept under review but at the moment the Trust has no plans to change it."

That's fine. Let the allowable 6-month delay stand, but let it be a delay in access-setting, not deposit. And let the immediate deposit be in the fundee's own institutional IR, with PMC harvesting it after the allowable delay -- rather than delaying the deposit itself, and insisting it be in PMC!

Stevan Harnad

Posted by Stevan Harnad in Self-Archiving Mandates at 17:25 | Comments (0) | Trackbacks (0)

The Immediate-Deposit/Optional-Access (ID/OA) Mandate: Rationale and Model

EXECUTIVE SUMMARY: Universities and research funders are both invited to use this document to help encourage the adoption of an Open Access Self-Archiving Mandate at their institution. Note that this recommended "Immediate-Deposit & Optional-Access" (IDOA) policy model (also called the "Dual Deposit/Release Strategy") has been specifically formulated to be immune from any delays or embargoes (based on publisher policy or copyright restrictions): The deposit -- of the author's final, peer-reviewed draft of all journal articles, in the author's own Institutional Repository (IR) -- is required immediately upon acceptance for publication, with no delays or exceptions. But whether access to that deposit is immediately set to Open Access or provisionally set to Closed Access (with only the metadata, but not the full-text, accessible webwide) is left up to the author, with only a strong recommendation to set access as Open Access as soon as possible (immediately wherever possible, and otherwise preferably with a maximal embargo cap at 6 months).

This IDOA policy is greatly preferable to, and far more effective than a policy that allows delayed deposit (embargo) or opt-out as determined by publisher policy or copyright restrictions. The restrictions apply only to the access-setting, not to the deposit, which must be immediate. Closed Access deposit is purely an institution-internal book-keeping matter, with the institution's own assets, and no publisher policy or copyright restriction applies to it.

[In the meanwhile, if there needs to be an embargo period, the IR software has a semi-automated EMAIL EPRINT REQUEST button that allows any would-be user to request (by entering their email address and clicking) and then allows any author to provide (by simply clicking on a URL that appears in the eprint request received by email) a single copy of the deposited draft, by email, on an individual basis (a practice that falls fully under Fair Use). This provides almost-immediate, almost-Open Access to tide over research usage needs during any Closed Access period.]

1. Research Accessibility

1.1 There exist 24,000 peer-reviewed journals (and conference proceedings) publishing 2.5 million articles per year, across all disciplines, languages and nations.

1.2 No university anywhere, not even the richest, can afford to subscribe to all or most of the journals that its researchers may need to use

1.3 Hence no article is accessible to all of its potential users, and hence all articles are losing some of their research impact (usage and citations).

2. Research Impact: Usage and Citations

2.1 This is confirmed by recent findings, independently replicated by many investigators, showing that articles for which their authors have supplemented subscription-based access to the publisher’s version by self-archiving their own final drafts free for all on the web are downloaded and cited twice as much across all 12 scientific, biological, social science and humanities disciplines analysed so far. (Note: there are no discipline differences in benefits of self-archiving, only in awareness.)

2.2 The total citation counts for articles submitted to the UK Research Assessment Exercise (RAE) are also very closely correlated with departmental RAE rankings (despite the fact that citations are not directly counted by RAE). More citations mean higher RAE ranking.

2.3 Hence citation counts are (i) robust indicators of research performance, (ii) they are not currently maximised for those articles that are not self-archived and (iii) those articles that are being self-archived have a substantial competitive advantage over those that are not.

3. University Self-Archiving Mandates Maximise Research Impact

3.1 Only 15% of the 2.5 million articles published annually are being spontaneously self-archived worldwide today.

3.2 Creating an Institutional Repository (IR) and encouraging staff to self-archive their articles therein is a good first step, but it is not sufficient to raise the self-archiving rate appreciably above the 15% baseline for spontaneous self-archiving.

3.3 Adding library help to encourage and assist staff to self-archive raises the self-archiving rate somewhat, but insufficiently.

3.4 The correct measure of institutional success in self-archiving is the ratio of annual self-archived articles in an institution’s IR relative to that institution’s total annual article output.

3.5 The only institutions that are reliably approaching a 100% annual self-archiving rate today are those that not only create an IR (3.2) and provide library help (3.3) for depositing, but also adopt a self-archiving policy requirement (or "mandate").

3.6 A self-archiving mandate is a simple and natural extension of universities’ already existing mandate to publish research findings (“publish or perish”); it is already linked to incentives by the fact that staff are promoted and funded on the basis of research performance indicators, of which citation impact is a prominent correlate, as in the RAE (2.2).

3.7 Two international, cross-disciplinary JISC surveys have found that 95% of authors will comply with a self-archiving mandate (81% willingly, 14% reluctantly).

3.8 The four institutions worldwide that have adopted a self-archiving mandate to date (CERN in Switzerland, Queensland University of Technology in Australia, Minho University in Portugal, and the ECS Department at University of Southampton) have each confirmed the outcome of the JISC author surveys (3.7), with their institutional self-archiving rates reliably climbing toward 100%,whereas institutions without mandates remain at the 15% spontaneous self-archiving baseline rate

.4. Action: This university should now mandate self-archiving university-wide

4.1 This university should now maximise its own research impact and set an example for the rest of the world by adopting a self-archiving mandate university-wide.

4.2 As indicated by the JISC survey and the empirical experience of the other 3 mandating institutions (3.8): there is no need for any penalties for non-compliance with the mandate; the mandate (and its own rewards: enhanced research access and impact) will take care of itself.

4.31 What needs to be mandated:
Immediate Deposit and Optional Access (IDOA):

- immediately upon acceptance for publication
- deposit in the university’s Institutional Repository
- the author’s final accepted draft (not the publisher’s proprietary PDF)
- both its full-text and its bibliographic metadata (author, date, title, journal, etc.)
(Note that only the depositing itself needs to be mandated. Setting the access privileges to the full-text can be left up to the author, with Open Access strongly encouraged, but not mandated. This makes the university’s self-archiving mandate completely independent of publishers’ self-archiving policies.)

4.32 The Eprints software allows authors to choose to set access as Open Access (OA) or Restricted Access (RA):
OA: both metadata and full-text are made visible and accessible to all would-be users web-wide
RA: metadata are visible and accessible web-wide but full-text is not
4.4 The decision as to whether to set full-text access as OA or RA can be left up to the author; 93% of authors will set full-text access as OA (4.2); for the remaining 7%, the Eprints software still makes it possible for any would-be user web-wide to request an eprint of the full-text automatically by email -- by just cut-pasting their own email address into a box and clicking; the author immediately receives the request and can instantly email the eprint with one click. The result will be 100% access to all Southampton research output, 93% immediately and directly, with one keystroke, 7% indirectly after a short delay, with a few extra keystrokes by user and author.

5. The Importance of Prompt Action

5.1 Self-archiving is effortless, taking only a few minutes and a few keystrokes; library help is available too (but hardly necessary).

5.2 The university should not delay in adopting a self-archiving mandate: 100% OA is both optimal and inevitable -- for research, researchers, their universities, their funders, and the tax-paying public that supports both the research and the universities. It will also give this university a strong competitive impact advantage over later adopters.

5.3 An early adopter not only provides a model for the world with its university-wide self-archiving policy but at the same stroke it maximizes its own research impact and research impact ranking.

5.4 The mandate need have no penalties or sanctions in order to be successful; it need only be formally adopted, with the support of Heads of Schools, the library, and computing services. The rest will take care of itself naturally of its own accord, as the experience of Southampton ECS, Minho, QUT and CERN has already demonstrated.

APPENDIX:
Southampton University Resources for Supporting Open Access Worldwide

A1 U. Southampton ECS department was the first department or institution in the world to adopt a self-archiving mandate (2001).

A2 ECS hosts Psycprints (1991), BBSPrints (1994), Open Journals (1995), OpCit (1996), CogPrints (1997); the American Scientist Open Access Forum (1998).

A3 ECS designed the first and most widely used software for creating institutional archives (GNU Eprints, 2000), now already used by about 200 institutions worldwide; ECS also created Citebase (2002), the citation-based OA search engine (well before Google Scholar).

A4 ECS conducted many of the seminal studies empirically demonstrating the citation impact advantage of self-archiving across all disciplines; ECS also maintains the growing and widely used bibliography of the accumulating findings on the OA Impact Advantage.

A5 ECS/Eprints maintains ROAR, the Registry of Open Access Repositories, tracking the number, size and growth of IRs and their contents worldwide.

A6 ECS/Eprints maintains ROARMAP, the Registry of Open Access Repository Material Archiving Policies, tracking the institutions worldwide that have adopted self-archiving policies, from recommendations to full mandates.

A7 ECS/Eprints maintains the ROMEO Directory of Journal Policies on Author Self-Archiving: 93% of the nearly 9000 journals registered to date (including all the principal publishers and the core ISI journals) have already formally endorsed author self-archiving; only 7% of journals have not.

A8 ECS/Southampton successfully lobbied the UK Parliamentary Select Committee in 2004 to mandate self-archiving; this led directly to the RCUK self-archiving mandate proposal, the Berlin 3 Policy Recommendation (formulated at Southampton) and the development of RAE submission mechanisms for the world’s two principal IR softwares (GNU Eprints, and MIT’s Dspace, both written by Southampton’s Rob Tansley).

Posted by Stevan Harnad in Self-Archiving Mandates at 01:09 | Comments (0) | Trackback (1)

Sunday, March 12. 2006

Optimizing Open Access Guidelines of Deutsche Forschungsgemeinschaft

The Open Access (OA) guidelines of Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) are very, very welcome, but I hope that a few seemingly minor details (see below) can be revised to make them an effective model for others worldwide:

DFG Passes Open Access Guidelines

Information for Researchers No. 04 30 January 2006

In 2003 the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) signed the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities. The DFG supports the culture of open access. Unhindered access to publications increases the distribution of scientific knowledge, thereby enhancing the authors' visibility and contributing to their reputations.

The DFG has now tied open access into its funding policy. During their meetings in January 2006, the DFG's Senate and Joint Committee recommended encouraging funded scientists to also digitally publish their results and make them available via open access.

The first problem concerns this clause:

"recommended encouraging funded scientists to also digitally publish their results and make them available via open access"

On the one hand, this clause is too weak: It is specifically because the NIH only "recommended/encouraged" that its public access policy has failed and now needs to be strengthened to "required/mandated."

On the other hand, the present clause is far too vague and ambiguous:

(1) Virtually all journals today are hybrid paper/digital already, so recommending/encouraging that the publication should have a "digital version" is breaking down open doors.

(2) What needs to be brought out clearly is that what is actually being required is that a digital version of the publication should be made open access (OA) -- by self-archiving it (depositing it in an OA repository).

(3) What can also be recommended/encouraged (but not required) is to publish in an OA journal where possible.

(4) All ambiguity about "publishing" and "publication" should be eliminated, by saying (and meaning) that "publishing" means publishing in a peer-reviewed journal, whereas depositing a published article in an OA repository is not publishing but access-provision. A published article is already published! Self-archiving increases the access to that publication by making it available to those would-be users who cannot afford subscription access to the publisher's proprietary version.

Recommended re-wording:

"require funded scientists to also self-archive their published results in an online repository to make them available via open access"

(5) No rights renegotiation is necessary at all for the 93% of journals that already endorse immediate self-archiving

(6) For the 7% of journals that do not yet endorse self-archiving, no rights renegotiation is needed for immediate depositing, but rights can be negotiated for setting Open Access.

NB: "OA Self-Archiving" means (i) depositing the full text and metadata in a web repository and (ii) setting access to the full-text as Open Access. The depositing itself (i) (where no one can see the full-text but the author) requires no permission from anyone! The only conceivable rights issue concerns access-setting.

"In order to put secondary publications (i.e. self-archived publications by which the authors provide their scientific work on the internet for free following conventional publication) on the proper legal footing, scientists involved in DFG-funded projects are also requested to reserve the exploitation rights."

(7) Please don't call providing OA to an already-published article "secondary publication"! In a formal sense self-archiving can indeed be construed that way, but that is not a construal that clarifies, it merely confuses. Leave publication to publishers. Authors don't publish their own articles, let alone publish their own already-published articles! They provide access to them, just as they did in paper days when they provided reprints or photocopies, none of which were called "secondary publication." Secondary publishers are publishers, 3rd parties (not the author, and not the primary publisher), that republish an entire published work; or they are indexers/abstracters, that republish parts of it. Self-archivng authors are not secondary publishers of their own published work.

(8) Whereas it is certainly useful and desirable to "reserve the exploitation rights" for authors' published articles, this is not a prerequisite for self-archiving their own drafts (rather than the publisher's PDF), and certainly not for the 68% of journals that are already "green," having given their official blessing to author self-archiving of postprints -- nor for the 25% more that have endorsed preprint self-archiving. Rights renegotiation is hence moot for all but 7% of the c. 8800 journals indexed in Romeo (and that includes virtually all the principal international journals).

(9) Most important: The rights negotiation is not about the depositing (which should be mandatory, and immediate upon acceptance for publication) but only about the access-setting -- i.e., whether access to the deposited full-text is set to "Open Access" or only "Restricted Access" (and if the latter, then for how long).

Recommended re-wording:

"For publications that they self-archive on the internet for free following publication, scientists involved in DFG-funded projects are also encouraged -- if the publisher has not already endorsed immediate author self-archiving -- to retain the immediate right to set access as 'Open Access'."

The guidelines continue:

Recommendations are currently being integrated into the usage guidelines, which form an integral part of every approval. They are worded as follows:

The DFG expects the research results funded by it to be published and to be made available, where possible, digitally and on the internet via open access.

"To achieve this, the contributions involved should either be published in discipline-specific or institutional electronic archives (repositories), or directly in referenced or recognised open access journals, in addition to conventional publishing."

The last sentence is awkward and ambiguous, mixing up publishing and self-archiving, but it is easily clarified: "To achieve this, all work should be published either in conventional journals or in recognised peer-reviewed open access journals; and in addition (the author's draft of) all publications should be self-archived in discipline-specific or institutional electronic archives (repositories)."
The guidelines continue:
"When entering into publishing contracts scientists participating in DFG-funded projects should, as far as possible, permanently reserve a non-exclusive right of exploitation for electronic publication of their research results for the purpose of open access. Here, discipline-specific delay periods of generally 6-12 months can be agreed upon, before which publication of previously published research results in discipline-specific or institutional electronic archives may be prohibited."
Recommended revision:
"When entering into publishing contracts with journals that do not already explicitly endorse immediate author self-archiving, scientists participating in DFG-funded projects should, as far as possible, permanently reserve a non-exclusive right to set access to their deposited draft as Open Access immediately upon deposit. An access-delay interval of 6-12 months is discouraged, but allowable under current DGF policy; during this interval the publication, always deposited immediately upon acceptance, may be placed under Restricted Access rather than Open Access."

(During the Restricted Access period, the metadata will still be visible webwide, and individual users can email the author to request an eprint of the full text.)
Allowing any Restricted Access interval at all is the weaker form of OA mandate, but it is still sufficient. It is critically important, however, that:

(a) Depositing the full text is required, not just requested

(b) The depositing itself must always be done immediately upon acceptance for publication, not after the access-delay interval agreed with the publisher

(c) During any agreed access-delay interval (one year maximum) access to the full-text can be set as Restricted Access rather than Open Access

I would also recommend against permitting a delay as long as one year: NIH is now moving from a year to 4 months; Wellcome allows 6 months but is planning to reduce that. There is no need for DFG to be more permissive of access restriction.

The guidelines finish thus:
Please ensure that a note indicating support of the project by the DFG is included in the publication.

The revised usage guidelines are expected to be available in April 2006.

Further information on open access is available at
http://www.dfg.de/lis/openaccess

For further information contact Dr. Johannes Fournier, Tel.: +49 (0)228/885-2418, e-mail: johannes.fournier@dfg.de.

Stevan Harnad

Posted by Stevan Harnad in Self-Archiving Mandates at 02:49 | Comments (0) | Trackbacks (0)

Friday, March 3. 2006

preservation vs. Preservation

This is perhaps a good juncture at which to make it explicit that there is "small-p preservation" and "large-P Preservation." Of course GNU Eprints, like everyone else (including ArXiv since way back in 1991) is doing small-p preservation, and will continue to do so: Open Access is for the sake of immediate access, today, tomorrow, and into the future -- and this, in turn, is for the sake of maximising immediate usage and impact, today, tomorrow, and into the future. Hence small-p preservation is a necessary means to that end.

But big-P Preservation, in contrast, is Preservation as an end in itself: as the motivation for archiving in the first place; or as a pressing need for ephemeral or fragile "born-digital" contents; or as a responsibility for content-providers (journal-providers) or content-purchasers (subscribing libraries) or content-preservers (deposit/record libraries) who need to ensure the perennity of their sold/purchased product.

So it is absurd to imagine (and for that reason needs to be stated explicitly, again and again, even though it is patently obvious) that Eprints is either oblivious to small-p preservation or that its contents are one bit more or less likely to vanish tomorrow than any other digital contents that are being conscientiously preserved and migrated and upgraded today, keeping up with the ongoing developments in the means of preservation.

The difference between preservation and Preservation is that preservation is not an end in itself, it is a means to an end (which is immediate, ongoing access-provision and usage), whereas Preservation is an end in itself.

Why is it so important to make it crystal clear that Eprints and OA are not for Preservation projects? that their primary motivation is not to ensure the longevity of digital contents (even though Eprints and OA do provide longevity, and do keep up with whatever developments occur in the means of long-term preservation of their contents)?

Because OA's target contents are 85% missing! The pressing problem of absent content cannot be its Preservation! Eighty-five percent of the 2.5 million articles published annually in the world's 24,000 journals are not being self-archived today (and, a fortiori, were not self-archived yesterday, or the month/year/decade before). What has been -- and continues to be -- lost, as a consequence of this, is not the contents in question (for they are being Preserved in their proprietary-product version, by their producers [publishers] along with their purchasers [libraries]).

What has been (and continues to be) lost for the 85% of annual OA target content that has not been (and is not being) self-archived, is access, usage, and impact. That is the true motivation for Eprints and OA self-archiving. And (listen carefully, because this is the gist of it!): that content will never be self-archived by its authors for the sake of Preservation, because it need not be: its Preservation is already in other hands than its authors (or its authors' institutions), as it always was, and for the foreseeable future will continue to be. The mission of authors and their institutions was not, is not, and should not have to be the Preservation of their own published journal article output [but see Note below**].

Nor, by the same token, is it the mission or motivation of authors' institutions to create Institutional Repositories (IRs) for the Preservation of their own published journal article output. If there is no better reason for creating OA IRs today than the Preservation of one's own journal article output, then there is no reason for institutions to create OA IRs today, and no reason for their authors to self-archive. This is a logical, empirical and practical fact, stated (recall, again) at a historical moment when 85% of OA target content is still missing, even though it is overdue, even though its self-archiving has been feasible for years, and even though its continuing absence entails that 85% of maximised research usage and impact (i.e., impact from usage by all would-be users rather than only those whose institutions can afford journal access) continues to be lost.

To wrongly identify the mission or motivation of Eprints or OA self-archiving with the need to Preserve digital contents is to provide yet another (strong) reason for authors not to self-archive. Because Preservation is simply no reason at all (for OA self-archiving).

And to subsume the urgent mission of finding a way to generate that missing 85% of OA target content under the murky mission of the generic Preservation of generic digital content is simply to miss the point of OA self-archiving altogether, and to imagine that it is merely yet another instance of Preservation-Archiving -- whose mission and motivation, to repeat, yet again, is not immediate, urgent, long-overdue content-provision, access-provision, and usage/impact-maximisation, but long-term content-Preservation, as an end in itself.

So please, let us reassure those who might be fussed about it, that the contents of OA IRs like Eprints can and will continue to be preserved, but that to be Preserved is not their purpose, nor the purpose of self-archiving: immediate and ongoing access-provision and usage/impact-maximisation is their purpose. And that purpose is currently not being met -- not because the OA contents are at risk of not being preserved today, but because (85% of) the OA contents are at a certainty of not being provided today.

The OA problem, in other words, is not Preservation tomorrow, but Provision today. Hitching today's Provision problem to tomorrow's Preservation problem is yet another recipe for prolonging the non-Provision of 85% of OA's target content.

What is needed for the provision of the missing 85% of OA's target content is author motivation; and the empirical findings on how OA enhances usage and impact go only part of the way toward engaging author motivation. The critical missing bit to ensure the provision of the missing content is institutional OA self-archiving mandates, not the plugging in of OA as merely another plank in the institution's generic Preservation platform.

I sense I am repeating myself -- but it appears to be needed, for the conflation of the Preservation-archiving mission and the OA access-provision mission just keeps recurring, deferring time, energy and motivation from OA access-provision, which is Eprints' raison d'etre.
[**Note: One last, somewhat subtler point, almost need not be stated, but it's probably better to make it explicit too, even though it is highly premature and highly hypothetical: If and when it should ever transpire -- and there is as yet no sign at all that it will -- that 100% OA via 100% self-archiving, having been neared or reached, should cause radical changes in the journal publishing system, forcing publishers to down-size into becoming only peer-review service-providers and certifiers, rather than also being the analog and digital product access-providers, as they are now, thereby forcing them to off-load access-provision and archiving onto their authors' institutions, then, and only if/when "then" ever comes, authors' institutions will inherit the primary-content Preservation mission, and not just the supplementary-content preservation mission.

But before that hypothetical contingency needs to be faced, there is still the very real, unsolved problem of getting that missing 85% of OA target content systematically self-archived. Let us not continue delaying that actuality by getting caught up in or deflected by hypothetical speculations.]
Stevan Harnad

Posted by Stevan Harnad in Self-Archiving Mandates at 13:19 | Comments (0) | Trackbacks (0)

(Page 1 of 2, totaling 11 entries) » next page

Entries from March 2006

Thursday, March 30. 2006

Formaldehyde and Function

Manual Evaluation of Robot Performance in Identifying Open Access Articles

Thursday, March 23. 2006

Online, Continuous, Metrics-Based Research Assessment

Wednesday, March 22. 2006

Optimizing MIT's Open Access Policy

Tuesday, March 21. 2006

Code vs. Content: Using/Revising Form vs. Using/Revising Findings

Monday, March 13. 2006

Proposed update of BOAI definition of OA: Immediate and Permanent

Wellcome Trust and the 6-month embargo

The Immediate-Deposit/Optional-Access (ID/OA) Mandate: Rationale and Model

Sunday, March 12. 2006

Optimizing Open Access Guidelines of Deutsche Forschungsgemeinschaft

Friday, March 3. 2006

preservation vs. Preservation

EnablingOpenScholarship (EOS)

Federal Research Public Access Act (FRPAA)

Alliance for Taxpayer Access (ATA)

Creative Commons License:

Quicksearch

Syndicate This Blog

Materials You Are Invited To Use To Promote OA Self-Archiving:

Archives

Calendar

Categories

Blog Administration

Statistics

Top Referrers

Syndicate This Blog