Open Access Archivangelism

Friday, September 12. 2008

Too Much Ado About PDF

Most OA self-archiving growth will be prospective, rather than retrospective initially, because it is the current and forward-going research articles that are most urgently needed for research progress, and that is what is being mandated by institutions and funders; the legacy corpus can and will follow thereafter.

Hence, insofar as the current and forward-going articles are concerned, the default option should be to deposit the author's final, peer-reviewed, revised, accepted draft (the postprint) in the author's Open Access Institutional Repository, not necessarily or even preferentially the publisher's PDF.

The author's postprint is the draft with the fewest publisher constraints (and any publisher endorsement of making the PDF OA automatically covers the postprint too).

And, as Alma Swan and Cliff Lynch have pointed out, the PDF is the least useful for data-mining.

And, as can never be pointed out often enough, the purpose of OA self-archiving is the enhancement of access, usage and impact of the research, not the digital preservation of the publisher's PDF! The postprint is a copy, not the original.

(For legacy deposits by authors who no longer have a digital draft of older articles, formatting the PDF, or scanning/OCR and reformatting, are obvious options.)

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Rationale for OA at 12:55 | Comments (0) | Trackbacks (0)

Wednesday, September 10. 2008

Australian innovation report recommends Open Access to research outputs

(Thanks to Colin Steele for forwarding this and to Glen Newton for the excerpting.)

The Australian minister for Innovation, Industry, Science and Research, Kim Carr spoke about this report in a speech released yesterday. Full text here.

It is embodied in a series of recommendations aimed at unlocking public information and content, including the results of publicly funded research.

The review panel recommends making this material available under a creative commons licence through:

machine searchable repositories, especially for scientific papers and data

and the internet, where it would be freely available to the world.

...The arguments for stepping out first on open access are the same as the arguments for stepping out first on emissions trading – the more willing we are to show leadership on this, we more chance we have of persuading other countries to reciprocate.

This speech reflects a number of recommendations in the report:

Recommendation 7.7: Australia should establish a National Information Strategy to optimise the flow of information in the Australian economy. The fundamental aim of a National Information Strategy should be to: ·utilise the principles of targeted transparency and the development of auditable standards to maximise the flow of information in private markets about product quality; and ·maximise the flow of government generated information, research, and content for the benefit of users (including private sector resellers of information).

Recommendation 7.8: Australian governments should adopt international standards of open publishing as far as possible. Material released for public information by Australian governments should be released under a creative commons licence.

Recommendation 7.9: Funding models and institutional mandates should recognise the research and innovation role and contributions of cultural agencies and institutions responsible for information repositories, physical collections or creative content and fund them accordingly.

Recommendation 7.10: A specific strategy for ensuring the scientific knowledge produced in Australia is placed in machine searchable repositories be developed and implemented using public funding agencies and universities as drivers.

Recommendation 7.14: To the maximum extent practicable, information, research and content funded by Australian governments including national collections should be made freely available over the internet as part of the global public commons. This should be done whilst the Australian Government encourages other countries to reciprocate by making their own contributions to the global digital pubic commons.

As ROARMAP indicates, the world leader in Open Access, both in time and in absolute size, is indisputably the United Kingdom: A UK Parliamentary Select Committee was the world's first governmental recommendation to mandate OA, in 2004. However, the UK government, under pressure from the publishing lobby, did not accept its own committee's recommendation. Nevertheless, six of the seven RCUK research funding councils went on to mandate OA anyway. The UK now has a total of 18 university and funder OA mandates (including the world's first OA mandate at Southampton's School of Electronics and Computer Science in 2003).

Australia, however, adopted the world's first university-wide OA mandate in 2004, and with its current total of 7 university and funder mandates along with this vigorous governmental support from Minister Carr, Australia is the world's relative, if not absolute leader in OA, by size as well as timing. And it is about to consolidate that leadership with an international Open Access and Research Conference in Brisbane next week, convened by Tom Cochrane, the DVC who engineered the world's first university OA mandate.

By way of comparison, the US, the country with the world's largest research output, has only five OA mandates (though that includes one from the NIH, the world's biggest biomedical research funder, as well as Faculty mandates from Harvard and Stanford). Universities are the sleeping giants, and the council of the European Universities Association (EUA) has unanimously recommended the adoption of an OA mandate by its 791 member universities in 46 countries -- but that mandate has not been adopted yet (although Professor Bernard Rentier, Rector of University of Liege is working on it, with EurOpenScholar).

But Australia looks poised now to be the one that sets all the dominoes falling worldwide.

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Self-Archiving Mandates at 18:25 | Comments (0) | Trackbacks (0)

Joseph Esposito's "Almost-OA": "Almost Pregnant"

Institutional Repositories (IRs) are for institutional research output (mostly their authors' final drafts of their published, peer-reviewed journal articles). IRs are not for institutional buy-in of the output of other institutions. (That would be an institutional library.) The way Open Access (OA) works is that an institution makes its own research output free for all online, in order to maximize its visibility, usage and impact. By symmetry, the institution's users also get access to the output of all other institutions' IRs, for free. No subscriptions, no fees, no consortia, no need for an institutional affiliation for anyone but the author of the work in the IR.

That’s OA. Almost-OA is when some of the IR material is still under a publisher embargo, so it is deposited as Closed Access instead of Open Access, and can be accessed using the IR’s almost-immediate “email eprint request” Button during the embargo. Almost-OA is not OA, but together with universal Immediate Deposit mandates, it will soon usher in universal OA.

In contrast, Joseph Esposito’s “Almost OA” is just a variant on institutional consortial licensing. It has no more to do with OA than being Almost Pregnant has to do with parity.

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Definition of Open Access at 02:35 | Comments (0) | Trackbacks (0)

Tuesday, September 9. 2008

Open Access and Research Conference 2008: Brisbane 24-25 September

Open Access and Research Conference 2008

STAMFORD PLAZA HOTEL,
BRISBANE, QUEENSLAND
24-25 SEPTEMBER

"The way we create and disseminate knowledge has undergone profound change over the last ten years...

"...we have seen a worldwide move towards establishing frameworks in which we can optimise access to and reuse of research especially that which is publicly funded...

"This has been supported by the development of open access repositories, new publishing tools and models and more strategic management of copyright at the individual and institutional level.

"QUT along with many other institutions throughout the world has been a pioneer in putting in place the management practices and necessary infrastructure to promote access and innovation.

"We are proud to announce what we believe will be a truly landmark conference that will draw on experts from Australia and around the world speak on a range of topics such as evolving publishing models, repository management, e-Research, policy development, and legal and technical issues."

In addition to our line up of prominent Australian speakers, International Keynote Speakers include:

John Wilbanks, Executive Director of Science Commons

Dr Alma Swan, Founder of Key Perspectives: Consultants to the scholarly information industry

Dr Tony Hey, Corporate Vice President of the External Research Division of Microsoft Research

Professor Stevan Harnad, UQAM & School of Electronics and Computer Science, University of Southampton

Maarten Wilbers, Deputy Legal Counsel, European Organization for Nuclear Research (CERN)

Posted by Stevan Harnad in Research Assessment at 03:32 | Comments (0) | Trackbacks (0)

Research Evaluation, Metrics and Open Access in the Humanities: Dublin 18-20 September

Research Evaluation, Metrics and Open Access in the Humanities
Coimbra-Group Workshop
Trinity College Dublin
18-20 September 2008

-- Aimed at Arts and Humanities researchers, Deans of Research, Librarians, research group leaders and policy makers within the Coimbra-Group member universities and the Irish University sector...

-- To compare established and innovative methods and models of research evaluation and assess their appropriateness for the Arts and Humanities sector...

-- To assess the increasing impact of bibliometrical approaches and Open Access policies on the Arts and Humanities sector...

Posted by Stevan Harnad in Research Assessment at 03:26 | Comments (0) | Trackbacks (0)

Monday, September 8. 2008

OA Needs Open Evidence, Not Anonymous Innuendo

The testimony of "Ethan" regarding SJI's publishing practices could have been valuable -- if it had been provided non-anonymously. As it stands, it merely amounts to anonymous, nonspecific, unsubstantiated allegations. If those allegations against SJI are in reality true, then making them anonymously, as "Ethan" does, does more harm than good, because then they can be so easily discredited and dismissed, as being merely the anonymous, nonspecific, unsubstantiated allegations that they are. (If they are in reality false, then they are of course a depolorable smear.)

Richard Poynder is a distinguished and highly respected journalist and the de facto chronicler of the OA movement. I hope "Ethan" has contacted Richard, as he requested, giving him his real name, and the names of the SJI journal submissions that he refereed and recommended for rejection as having "zero scientific value." Richard can then fact-check (confidentially, without embarrassing the authors) whether or not any of those articles were published as such.

What “Ethan” should have done if he was, as he said, receiving articles of low quality to referee, in a “peer review” procedure of doubtful quality, was to resign as referee, request removal of his name from the list of referees — did “he”? and was his name removed? — and, if he felt strongly enough, offer to make his objective evidence available to those who may wish to investigate these publishing practices.

What is needed in order to expose shoddy publishing practices is objective, verifiable evidence and open answerability, not anonymous allegations (as in the entrails of Wikipedia, where pseudonymous bullying reigns supreme).

This is not, after all, whistle-blowing on the Mafia, that requires a witness protection program. If you offer to referee for a publisher with shoddy peer-review practices, you risk nothing if you provide what objective evidence you have of those practices.

I know that "publish or perish" has authors fearful of offending publishers by doing anything they think might reduce their chances of acceptance, and that referees often perform their gratis services out of the same superstitious worry; and I know that junior referees are worried about offending senior researchers if they are openly critical of their work, and that even peer colleagues and rivals are often leery of the consequences of openly dissing one another’s research.

Yet none of these bits of regrettable but understandable professional paranoia explain why "Ethan" felt the need to hide under a cloak of anonymity in providing objective evidence of dubious peer-review practices by a publisher and journals that hardly have the patina or clout of some of the more prestigious established publishers and journals.

Is it SJI's public threats of litigation, through postings like the one below, that have everyone so intimidated?

“Lies, fear and smear campaigns against SJI and other OA journals”

Surely the antidote against that sort of thing is open evidence, not anonymous innuendo. (Something better is needed by way of open evidence, however, than just contented testimonials elicited from accepted authors!)

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Peer Review at 03:56 | Comments (0) | Trackbacks (0)

Thursday, September 4. 2008

Offloading Cognition onto Cognitive Technology

Dror, I. and Harnad, S. (2009) Offloading Cognition onto Cognitive Technology. In: Cognition Distributed: How Cognitive Technology Extends Our Minds, John Benjamins. (In Press)

Abstract: "Cognizing" (e.g., thinking, understanding, and knowing) is a mental state. Systems without mental states, such as cognitive technology, can sometimes contribute to human cognition, but that does not make them cognizers. Cognizers can offload some of their cognitive functions onto cognitive technology, thereby extending their performance capacity beyond the limits of their own brain power. Language itself is a form of cognitive technology that allows cognizers to offload some of their cognitive functions onto the brains of other cognizers. Language also extends cognizers' individual and joint performance powers, distributing the load through interactive and collaborative cognition. Reading, writing, print, telecommunications and computing further extend cognizers' capacities. And now the web, with its network of cognizers, digital databases and software agents, all accessible anytime, anywhere, has become our “Cognitive Commons,” in which distributed cognizers and cognitive technology can interoperate globally with a speed, scope and degree of interactivity inconceivable through local individual cognition alone. And as with language, the cognitive tool par excellence, such technological changes are not merely instrumental and quantitative: they can have profound effects on how we think and encode information, on how we communicate with one another, on our mental states, and on our very nature.

Posted by Stevan Harnad in Rationale for OA at 10:57 | Comments (0) | Trackbacks (0)

Wednesday, September 3. 2008

SHERPA/RoMEO: Publishers with Paid Options for Open Access

SUMMARY: SHERPA/RoMEO is listing the paid OA charges of non-Green publishers. On no account should any author or institution ever have to pay money to a publisher in order to be able to comply with a mandate to provide Open Access (OA). That would be a perverse distortion of the purpose of both OA and OA mandates; it would also profoundly discourage funders and institutions from mandating OA, and authors from complying with OA mandates.
SHERPA has an outstanding record for supporting and promoting OA, worldwide. The OA movement and the global research community are greatly in their debt. However, SHERPA alas also has a history of amplifying arbitrary, irrelevant and even absurd details and noise associated with publisher policies and practices, instead of focusing on what makes sense and is essential to the understanding and progress of OA.
I urge SHERPA to focus on what the research community needs to hear, understand and do in order to reach 100% OA as soon as possible -- not on advertising publisher options that are not only unnecessary but counterproductive to the growth of OA and OA mandates. Authors and funders who are foolish enough to squander their money on paying non-green publishers for OA (instead of just depositing their postprints anyway and relying on the IR's "email eprint request" Button during any embargo) can find out the prices for themselves. SHERPA/ROMEO is not a publishers' price catalogue.

The SHERPA/RoMEO site says:

"Where a publishers' standard policy does not allow an author to comply with their funding agency's mandate, paid open access options may enable an author to comply."

On no account should any author have to comply with any mandate to provide Open Access (OA) by having to pay money to a publisher.

That would be a grotesque distortion of the purpose of both OA and OA mandates.

It would also profoundly discourage funders and institutions from mandating OA, and authors from complying with OA mandates.

If a journal is not one of the 63% of journals that are already Green on immediate OA self-archiving then the right strategy for the author is to deposit the refereed final draft in their institutional repository anyway, immediately upon acceptance for publication.

Access to that deposit can then be set as Closed Access instead of Open Access during the publisher embargo, if the author wishes. The repository's semi-automatic "email eprint request" Button can then provide all would-be users with almost-OA during the embargo.

Most OA mandates tolerate an embargo of 6-12 months. Once immediate deposit is universally mandated by 100% of funders and institutions, that will provide at least 63% immediate-OA plus at most 37% almost-OA, immediately, for a universal total of 100% immediate-OA plus almost-OA.

After OA mandates are adopted universally, the increasingly palpable benefits of the resulting OA for research, researchers, and the tax-paying public will ensure that the rest of the dominos will inevitably fall of their own accord: Access embargoes will soon die their natural (and well-deserved) deaths, yielding 100% immediate-OA.

SHERPA has an outstanding record for supporting and promoting OA, worldwide. The OA movement and the global research community are greatly in their debt. However, SHERPA alas also has a history of amplifying arbitrary, irrelevant and even absurd details and noise associated with publisher policies and practices, instead of focusing on what makes sense and is essential to the understanding and progress of OA.

I urge SHERPA to focus on what the research community needs to hear, understand and do in order to reach 100% OA as soon as possible -- not on advertising publisher options that are not only unnecessary but counterproductive to the growth of OA and OA mandates.

Charles Oppenheim, U. Loughborough, replied:

"Stevan misunderstands the purpose of SHERPA/ROMEO. It is there to report publishers' terms and conditions, to help authors decide where to place their articles. To argue that it should not list those publishers that are not "green" is akin to asking an abstracting service not to record those articles that the editor happens not to agree with.

Some funders, such as Wellcome, encourage the applicant for funding to include the cost of paying a "gold" journal in their funding bid. If it is to perform a useful information function, SHERPA/ROMEO has to reflect current reality, not ideal future scenarios."

I can only disagree (profoundly) with my comrade-at-arms, Charles Oppenheim, on this important strategic point!

I certainly did not say that SHERPA/ROMEO should only list Green Publishers! It should list all publishers (and, more relevantly, all their individual journals).

But along with all the journals, SHERPA/ROMEO should only list and classify the journal policy details that are relevant to OA, OA mandates, and the growth of OA.

Those four relevant journal policy details are these:

(1) Does the journal endorse immediate OA self-archiving of the refereed postprint? If so, the journal is GREEN.

(2) Does the journal endorse the immediate OA self-archiving of the unrefereed preprint only? If so, the journal is PALE-GREEN.

(3) Is the journal neither GREEN nor PALE-GREEN? If so then the journal is GRAY.

(4) If the journal is PALE-GREEN or GRAY, does it endorse OA self-archiving after an embargo? If so, how long?

That's it. All the rest of the details that SHERPA/ROMEO is currently canonizing are irrelevant amplifications of noise that merely confuse instead of informing, clarifying and facilitating OA-relevant policy and decisions on the part of authors, institutions and funders.

Amongst the irrelevant and confusing idiosyncratic publisher details that SHERPA/ROMEO is currently amplifying (and there are many!), there are two that might be worth retaining as a footnote, as long as it is made clear that they are not fundamental for policy or practice, but merely details for two special cases:

(i) What version is endorsed for OA self-archiving: the author's final draft or the publisher's PDF?

(ii) Where does the journal endorse self-archiving: the author's institutional repository and/or central repositories?

The reason these details are inessential is that the default option in both cases is already known a priori:

(i) Self-archiving the author's final draft is the default option. A publisher that endorses self-archiving the publisher's PDF also authorizes, a fortiori, the self-archiving of the author's final draft. (IP pedants and pundits might have some fun thrashing this one back and forth, citing all sorts of formalisms and legalisms, but in the end, sense would prevail: Once the publisher has formally authorized making the published article OA, Pandora's box is open [sic], and residual matters concerning authors' prior versions or subsequent updates are all moot [as they should be].)

The default option of self-archiving the postprint is sufficient for OA, hence the PDF side-show is a needless distraction.

(ii) Self-archiving in the author's institutional repository is the default option. A publisher that endorses self-archiving in a central repository also endorses, a fortiori, self-archiving in the author's own institutional repository.

The default option of self-archiving in the institutional repository is sufficient for OA, hence the matter of central deposit is a needless distraction. (Where direct central deposit is mandated by a funder, this can and will be implemented by automatic SWORD-based export to central repositories, of either the metadata and full-text or merely the metadata and the link to the full-text.)

Hence (i) and (ii) are minor details that need only be consulted by those who, for some reason, are particularly concerned about the PDF, or those who need to comply with a funder mandate that (neelessly) specifies direct central deposit.

There is absolutely no call for SHERPA/ROMEO to advertise the price lists of GRAY publishers for paid OA! I can only repeat that that is grotesque. Let authors and funders who are foolish enough to squander their money on paying those non-GREEN publishers (instead of just relying on their tolerated embargo limits plus the Button) find out the prices for themselves. (SHERPA/ROMEO is not an abstracting service; nor is it a publishers' price catalogue!)

Peter Millington, SHERPA, replied:

PM: "It is a pity that Prof. Harnad is only interested in "default" and "sufficient" options, and not in the best options, or indeed the most appropriate options. While author's final post-refereed draft is sufficient and acceptable for open access and research purposes, it is not the best."

OA is the best for research purposes. We don't yet have it. And it's long overdue.

I'm not sure whose purposes the publisher's PDF is best for, but whoever they are, their purposes are getting in the way of what is best for research purposes.

PM: "The best is the published version (publisher's PDF if you will). At the very least, this is the authoritative version vis-à-vis page numbers for quoted extracts and the like."

This issue has been much discussed in these pages: OA is needed (urgently) for all those users who can't afford paid access to the publisher's PDF. What these would-be users lack is access to the text, not a means of quoting extracts. Extracts can be quoted by paragraph number. Pages are on the way out anyway. What is urgently needed is access to the text. Publishers are far more willing to endorse self-archiving of the author postprint than the publisher's proprietary PDF. Hence author postprint self-archiving is the default option (if maximal OA, now, is the goal).

PM: "Also, it significantly expedites deposition to be able to use the publisher's PDF rather than having to generate your own, with all the complications that that may entail."

Significantly expedites deposition of what, where, by whom? I have deposited nearly 300 of my papers in the Southampton ECS IR. It takes me 1 keystroke and 1 second to generate PDF from TeX or to generate PDF or HTML from RTF. What complications do you have in mind?

PM: "In my view, the publishers who permit the use of their PDFs deserve to be applauded for their far-sightedness. Other publishers should be encouraged to do likewise."

The publishers to be applauded are the ones that are Green on immediate OA deposit of the postprint, regardless of whether they specify the author's postprint or the publisher's PDF. That's the line separating who is and isn't on the side of the angels regarding OA. The rest is trivial and irrelevant.

PM: "SHERPA therefore makes no apologies for having published our 'good list'"

SHERPA ROMEO could do a far, far greater service in informing authors and institutions, and in promoting OA, if it at long last got rid of all its superfluous categories and colour codes (yellow/preprint, blue/postprint, green/preprint+postprint, white/neither, and now "good"/PDF) and simply published a clear list of all the journals that endorse postprint self-archiving, regardless of whether the postprint is author-draft or publisher PDF and regardless of whether the journal also happens to endorse unrefereed preprint self-archiving -- and call that GREEN.

That, after all, is what OA is all about, and for.

PM: "(Some may concur with Prof. Harnad in regarding the Paid OA list as the "bad list", but I couldn't possibly comment.)"

The only thing authors need to know about these journals is that they are GRAY (and perhaps also how long they embargo access-provision).

PM: "As for where material may be deposited, Prof. Harnad states that permission to deposit in institutional repositories should be the default, implying that this would be sufficient. However, as before, institutional repositories alone are not the best option. Surely the best policy must be to be able to deposit in any open access repository - institutional and/or disciplinary."

No, the best policy is to allow deposit in any OA repository and to explicitly prefer IR deposit wherever possible. That is the way to integrate institutional and funder OA mandates so as to generate a convergence and synergy that will systematically cover all of OA space quickly and completely.

PM: "In any case, SHERPA/RoMEO has no choice but to reflect/quote the terminology for repository types used in the publishers' open access policies, CTAs, and related documentation. These are often wanting in clarity and are not always fully thought through. If the publishers do better, it follows that SHERPA/RoMEO's data will also improve."

Wouldn't it be nice, though, if SHERPA/ROMEO could lead publishers toward clarity, rather than just following and amplifying their obscurity (and their often deliberate obscurantism)?

This is all said in the spirit of unabating appreciation for all that SHERPA does do for OA -- but with an equally unabating frustration at what SHERPA persists in not doing for (and sometimes unwittingly doing against) OA, even though it would be ever so easy to fix.

This continuing insistence upon amplifying incoherent publisher noise simply because it is there cannot be described as a service to OA. And SHERPA does have a choice: It can do better for the research community without waiting for publishers to improve.

Andria McGrath replied:

AM: "It may be foolishness on the part of the funders, but I'm afraid it is the case that ALL the UK medical funders do insist that articles reporting research funded by them are posted on UK PubMedCentral within 6 months."

I have a simple solution, both for individual authors and for institutions who are trying to comply with a funder mandate to self-archive centrally articles that are published by journals that only endorse institutional OA self-archiving:

(1) Deposit the (refereed) postprint institutionally, immediately upon acceptance for publication.

(2) If the journal is Green on immediate OA, make the deposit immediately OA (otherwise rely on the "email eprint request" Button).

(3) If setting access to the deposit to OA is embargoed, set the access to OA when the embargo is over.

(4) If a funder requires deposit in UK PubMedCentral, set up SWORD for your IR so it will export the deposit to UKPMC at the requisite time -- and then let UKPMC worry about access-setting for the UKPMC version.

The author has complied with the funder mandate by depositing in his IR immediately upon acceptance for publication, and by setting access to the IR deposit as OA at the end of the publisher embargo.

That's all there is to it. Funders cannot mandate any more of an author. And if the funder wants to pay publishers for the right to make the central UKPMC version OA, let them pay the publisher themselves.

The funder mandates are deposit mandates, not payment mandates. Comply by depositing institutionally, providing OA institutionally, and exporting the deposit to UKPMC. That's all there is to it.

AM: "I have just been going through Romeo trying to determine which of the major publishers allow this without the payment of article processing charges and there are very few. So far I have come up with BMJ Publishing, CUP, Company of Bioloigsts and Nat. Acad. Sci. that do allow this."

Fine. When those IR deposits are exported via SWORD to UKPMC, there will be no charge to be paid. For publishers other than those four, there may be; that is not the problem of the author or his institution. And anyone construing the funder mandates as implying that it is the problem of the author or his institution, and that the mandate entails any further expense to the author or his institution, is profoundly misconstruing the mandate, the rationale for the mandate, and the rationale for OA.

Andria, you will have to get used to the fact that steps have been taken without carefully thinking them through. The funder OA mandates were very timely and welcome, and extremely important historically. But some (by no means all!) of them were also vague, careless, and, to a great extent "monkey see, monkey do" (many taking their cue from NIH and the Wellcome Trust, who themselves had not thought it through carefully enough).

MRC simply adopted the wrong (because inchoate) mandate model. Other RCUK councils (such as, ARC, BBSRC, STFC) picked a better one. So did Europe's ERC, and now the EC, based on the EURAB model, which is the IDOA mandate model, the optimal one, and leads to none of these nonsensical consequences.

Good sense will eventually prevail, but until then, those who are trying to implement the existing mandates should not try to put themselves through impossible hoops -- and on no account should they lead their authors and institutions into grotesque and gratuitous expenses or constraints that were never the intention of either OA or OA mandates.

Just follow the sensible steps (1) - (4) above, and the rest will take care of itself as a matter of natural course.

AM: "As far as I can tell, Elsevier, Humana, Int Med Press, Wiley, Karger, Kluwer, Royal Soc and Springer do not allow self archiving in UK PMC by authors within 6 months, so that all authors funded by the medical charities are going to be forced into paying article processing charges to comply with their funders requirements if they publish in these publishers journals or in fully open access journals that make charges, like BMC."

Not only is it pure absurdity to imagine that the funder mandates were actually mandates for authors and their institutions to pay publishers for paid OA, but it is equally absurd to imagine that they were mandates for authors to publish only with publishers who endorse central self-archiving!

Every single one of the eight publishers you list is on the side of the angels as regards OA: They are all Green on immediate deposit in the author's institutional repository, and the immediate setting of access to that deposit as OA.

Did anyone really imagine that OA was about more than that? That it further required publishers to consent to deposit in central repositories, for someone's capricious reasons? (The saga is made even sillier by the fact that if the blinkered centralists had sensibly targeted universal IR deposit first, then the dominos would -- and will -- fall for central repositories soon enough anyway! But instead they are creating gratuitous obstacles for OA itself, by putting centrality itself before OA -- and for absolutely no good reason, since all OA IRs are fully interoperable and harvestable anyway.)

And don't even get me started on the fatuousness of having decided to copycat PMC with a UKPMC! As if there were another category of biomedical research, consisting of UK biomedical research, requiring a central repository of its own: "Let me see now, what is it that British researchers -- and British researchers alone -- have found discovered about AIDS." (I hope no one replies that "one can search across PMC and UKPMC jointly," because that is the whole point! Search is done across distributed contents, not by going to -- or requiring -- one particular locus-of-deposit. Think OAIster, Citeseer or Google Scholar, not UKPMC! At most, UKPMC could simply be a harvester of UK biomedical output, for actuarial purposes, wherever its physical locus might happen to be.)

AM: "In view of this I do find it useful to have the extra information that Romeo is adding, and I would welcome even more specific info about publisher's policies re PMC. If I have any of this wrong I would be very grateful if people would let me know."

I think you have a good deal of it very, very wrong -- but it's not your fault, and you are not alone. I just wonder whether we will persist in bumbling in this misdirection for a few more years, yet again, until we discover we have goofed, or we will manage -- mirabile dictu -- to rally the good sense to fix it in advance...

Your weary and fast-wizening archivangelist,

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Self-Archiving Mandates at 16:10 | Comments (0) | Trackbacks (0)

Friday, August 29. 2008

On Eggs and Citations

Update Jan 1, 2010: See Gargouri, Y; C Hajjem, V Larivière, Y Gingras, L Carr,T Brody & S Harnad (2010) “Open Access, Whether Self-Selected or Mandated, Increases Citation Impact, Especially for Higher Quality Research”
Update Feb 8, 2010: See also "Open Access: Self-Selected, Mandated & Random; Answers & Questions"

Failing to observe a platypus laying eggs is not a demonstration that the platypus does not lay eggs.

You have to actually observe the provenance, ab ovo, of those little newborn platypusses, if you want to demonstrate that they are not being engendered by egg-laying.

Failing to observe a significant OA citation Advantage within a year of publication (or a year and a half -- or longer, as the case may be) with randomized OA does not demonstrate that the many studies that do observe a significant OA citation Advantage with nonrandomized OA are simply reporting self-selection artifacts (i.e., selective provision of OA for the more highly citable articles.)

To demonstrate the latter, you first have to replicate the OA citation Advantage with nonrandomized OA (on the same or comparable sample) and then demonstrate that randomized OA (on the same or comparable sample) eliminates that OA citation Advantage (on the same or comparable sample).

Otherwise, you are simply comparing apples and oranges (or eggs and expectations, as the case may be) in reporting a failure to observe a significant OA citation Advantage in a first-year (or first 1.5-year) sample with randomized OA -- along with a failure to observe a significant OA citation Advantage for nonrandomized OA either, for the same sample (on the grounds that the nonrandomized OA subsample was too small):

The many reports of the nonrandomized OA Citation Advantage are based on samples that were sufficiently large, and on a sufficiently long time-scale (almost never as short as a year) to detect a significant OA Citation Advantage.

A failure to observe a significant effect with small, early samples, on short time-scales -- whether randomized or nonrandomized -- is simple that: a failure to observe a significant effect: Keep testing till the size and duration of your sample of randomized and nonrandomized OA is big enough to test your self-selection hypothesis (i.e., comparable with the other studies that have detected the effect).

Meanwhile, note that (as other studies have likewise found), although a year proved too short to observe a significant OA citation Advantage for randomized (or nonrandomized) OA, it did prove long enough to observe a significant OA download Advantage for randomized OA -- and that other studies have also reported that early download advantages correlate significantly with later significant citation advantages.

Just as mating more is likely to lead to more progeny for platypusses (by whatever route) than mating less, so accessing and downloading more is likely to lead to more citations for papers than accessing and downloading less.

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Methodology at 20:18 | Comments (0) | Trackbacks (0)

Monday, August 25. 2008

Confirmation Bias and the Open Access Advantage: Some Methodological Suggestions for Davis's Citation Study

Update Jan 1, 2010: See Gargouri, Y; C Hajjem, V Larivière, Y Gingras, L Carr,T Brody & S Harnad (2010) “Open Access, Whether Self-Selected or Mandated, Increases Citation Impact, Especially for Higher Quality Research”
Update Feb 8, 2010: See also "Open Access: Self-Selected, Mandated & Random; Answers & Questions"

SUMMARY: Davis (2008) analyzes citations from 2004-2007 in 11 biomedical journals. For 1,600 of the 11,000 articles (15%), their authors paid the publisher to make them Open Access (OA). The outcome, confirming previous studies (on both paid and unpaid OA), is a significant OA citation Advantage, but a small one (21%, 4% of it correlated with other article variables such as number of authors, references and pages). The author infers that the size of the OA advantage in this biomedical sample has been shrinking annually from 2004-2007, but the data suggest the opposite. In order to draw valid conclusions from these data, the following five further analyses are necessary:
    (1) The current analysis is based only on author-choice (paid) OA. Free OA self-archiving needs to be taken into account too, for the same journals and years, rather than being counted as non-OA, as in the current analysis.
    (2) The proportion of OA articles per journal per year needs to be reported and taken into account.
    (3) Estimates of journal and article quality and citability in the form of the Journal Impact Factor and the relation between the size of the OA Advantage and journal as well as article “citation-bracket” need to be taken into account.
    (4) The sample-size for the highest-impact, largest-sample journal analyzed, PNAS, is restricted and is excluded from some of the analyses. An analysis of the full PNAS dataset is needed, for the entire 2004-2007 period.
    (5) The analysis of the interaction between OA and time, 2004-2007, is based on retrospective data from a June 2008 total cumulative citation count. The analysis needs to be redone taking into account the dates of both the cited articles and the citing articles, otherwise article-age effects and any other real-time effects from 2004-2008 are confounded.
Davis proposes that an author self-selection bias for providing OA to higher-quality articles (the Quality Bias, QB) is the primary cause of the observed OA Advantage, but this study does not test or show anything at all about the causal role of QB (or of any of the other potential causal factors, such as Accessibility Advantage, AA, Competitive Advantage, CA, Download Advantage, DA, Early Advantage, EA, and Quality Advantage, QA). The author also suggests that paid OA is not worth the cost, per extra citation. This is probably true, but with OA self-archiving, both the OA and the extra citations are free.

Comments on: Davis, P.M. (2008) Author-choice open access publishing in the biological and medical literature: a citation analysis. Journal of the American Society for Information Science and Technology (JASIST) (in press) http://arxiv.org/pdf/0808.2428v1

The Davis (2008) preprint is an analysis of the citations from years c. 2004-2007 in 11 biomedical journals: c. 11,000 articles, of which c. 1,600 (15%) were made Open Access (OA) through “Author Choice” (AC-OA): author chooses to pay publisher for OA). Author self-archiving (SA-OA) articles from the same journals was not measured.

The result was a significant OA citation advantage (21%) over time, of which 4% was correlated with variables other than OA and time (number of authors, pages, references; whether article is a Review and has a US co-author).

This outcome confirms the findings of numerous previous studies (some of them based on far larger samples of fields, journals, articles and time-intervals) of an OA citation advantage (ranging from 25%-250%) in all fields, across a 10-year range (Hitchcock 2008).

The preprint also states that the size of the OA advantage in this biomedical sample diminishes annually from 2004-2007. But the data seem to show the opposite: that as an article gets older, and its cumulative citations grow, its absolute and relative OA advantage grow too.

The preprint concludes, based on its estimate of the size of the OA citation Advantage, that AC-OA is not worth the cost, per extra citation. This is probably true -- but with SA-OA the OA and the extra citations can be had at no cost at all.

The paper is accepted for publication in JASIST. It is not clear whether the linked text is the unrefereed preprint, or the refereed, revised postprint. On the assumption that it is the unrefereed preprint, what follows is an extended peer commentary with recommendations on what should be done in revising it for publication.

(It is very possible, however, that some or all of these revisions were also recommended by the JASIST referees and that some of the changes have already been made in the published version.)

As it stands currently, this study (i) confirms a significant OA citation Advantage, (ii) shows that it grows cumulatively with article age and (iii) shows that it is correlated with several other variables that are correlated with citation counts.

Although the author argues that an author self-selection bias for preferentially providing OA to higher-quality articles (the Quality Bias, QB) is the primary causal factor underlying the observed OA Advantage, in fact this study does not test or show anything at all about the causal role of QB (or of any of the other potential causal factors underlying the OA Advantage, such as Accessibility Advantage, AA, Competitive Advantage, CA, Download Advantage, DA, Early Advantage, EA, and Quality Advantage, QA; Hajjem & Harnad 2007b).

The following 5 further analyses of the data are necessary. The size and pattern of the observed results, as well as their interpretations, could all be significantly altered (as well as deepened) by their outcome:

(1) The current analysis is based only on author-choice (paid) OA. Free author self-archiving OA needs to be taken into account too, for the same journals and years, rather than being counted as non-OA, as in the current analysis.

(2) The proportion of OA articles per journal per year needs to be reported and taken into account.

(3) Estimates of journal and article quality and citability in the form of the Journal Impact Factor (journal’s average citations) and the relation between the size of the OA Advantage and journal and article “citation-bracket” need to be taken into account.

(4) The sample-size for the highest-impact, largest-sample journal, PNAS, is restricted and is excluded from some of the analyses. A full analysis of the full PNAS dataset is needed, for the entire 2004-2007 period.

(5) The analysis of the interaction between OA and time, 2004-2007, is based on retrospective data from a June 2008 total cumulative citation count. The analysis needs to be redone taking into account the dates of both the cited articles and the citing articles, otherwise article-age effects and any other real-time effects from 2004-2008 are confounded.

Commentary on the text of the preprint:

“ABSTRACT… there is strong evidence to suggest that the open access advantage is declining by about 7% per year, from 32% in 2004 to 11% in 2007”

It is not clearly explained how these figures and their interpretation are derived, nor is it reported how many OA articles there were in each of these years. The figures appear to be based on a statistical interaction between OA and article-age in a multiple regression analysis for 9 of the 11 journals in the sample. (a) The data from PNAS, the largest and highest-impact journal, are excluded from this analysis. (b) The many variables included in the (full) multiple regression equation (across journals) omit one of the most obvious ones: journal impact factor. (c) OA articles that are self-archived rather than paid author-choice are not identified and included as OA, hence their citations are counted as being non-OA. (d) The OA/age interaction is not based on yearly citations after a fixed interval for each year, but on cumulative retrospective citations in June 2008.

The natural interpretation of Figure 1 accordingly seems to be the exact opposite of the one the author makes: Not that the size of the OA Advantage shrinks from 2004-2007, but that the size of the OA Advantage grows from 2007-2004 (as articles get older and their citations grow). Not only do cumulative citations grow for both OA and non-OA articles from year 2007 articles to year 2004 articles, but the cumulative OA advantage increases (by about 7% per year, even on the basis of this study’s rather slim and selective data and analyses).

This is quite natural, as not only do citations grow with time, but the OA Advantage -- barely detectable in the first year, being then based on the smallest sample and the fewest citations -- emerges with time.

“See Craig et al. [2007] for a critical review of the literature [on the OA citation advantage]”

Craig et al’s rather slanted 2007 review is the only reference to previous findings on the OA Advantage cited by the Davis preprint (Harnad 2007a). Craig et al. had attempted to reinterpret the many times replicated positive finding of an OA citation advantage, on the basis of 4 negative findings (Davis & Fromerth, 2007; Kurtz et al., 2005; Kurtz & Henneken, 2007; Moed, 2007), in maths, astronomy and condensed matter physics, respectively. Apart from Davis’s own prior study, these studies were based mainly on preprints that were made OA well before publication. The observed OA advantage consisted mostly of an Early Access Advantage for the OA prepublication preprint, plus an inferred Quality Bias (QB) on the part of authors towards preferentially providing OA to higher quality preprints (Harnad 2007b).

The Davis preprint does not cite any of the considerably larger number of studies that have reported large and consistent OA advantages for postprints, based on many more fields, some of them based on far larger samples and longer time intervals (Hitchcock 2008). Instead, Davis focuses rather single-mindedly on the hypothesis that most or all of the OA Advantage is the result for the self-selection bias (QB) toward preferentially making higher-quality (hence more citeable) articles OA:

“authors selectively choose which articles to promote freely… [and] highly cited authors disproportionately choose open access venues”

It is undoubtedly true that better authors are more likely to make their articles OA, and that authors in general are more likely to make their better articles OA. This Quality or “Self-Selection” Bias (QB) is one of the probable causes of the OA Advantage.

However, no study has shown that QB is the only cause of the OA Advantage, nor even that it is the biggest cause. Three of the studies cited (Kurtz et al., 2005; Kurtz & Henneken, 2007; Moed, 2007) showed that another causal factor is Early Access (EA: providing OA earlier results in more citations).

There are several other candidate causal factors in the OA Advantage, besides QB and EA (Hajjem & Harnad 2007b):

There is the Download (or Usage) Advantage (DA): OA articles are downloaded significantly more, and this early DA has also been shown to be predictive of a later citation advantage in Physics (Brody et al. 2006).

There is a Competitive Advantage (CA): OA articles are in competition with non-OA articles, and to the extent that OA articles are relatively more accessible than non-OA articles, they can be used and cited more. Both QB and CA, however, are temporary components of the OA advantage that will necessarily shrink to zero and disappear once all research is OA. EA and DA, in contrast, will continue to contribute to the OA advantage even after universal OA is reached, when all postprints are being made OA immediately upon publication, compared to pre-OA days (as Kurtz has shown for Astronomy, which has already reached universal post-publication OA).

There is an Accessibility Advantage (AA) for those users whose institutions do not have subscription access to the journal in which the article appeared. AA too (unlike CA) persists even after universal OA is reached: all articles then have AA's full benefit.

And there is at least one more important causal component in the OA Advantage, apart from AA, CA, DA and QB, and that is a Quality Advantage (QA), which has often been erroneously conflated with QB (Quality Bias):

Ever since Lawrence’s original study in 2001, the OA Advantage can be estmated in two different ways: (1) by comparing the average citations for OA and non-OA articles (log citation ratios within the same journal and year, or regression analyses like Davis’s) and (2) by comparing the proportion of OA articles in different “citation brackets” (0, 1, 2, 3-4, 5-8, 9-16, 17+ citations).

In method (2), the OA Advantage is observed in the form of an increase in the proportion of OA articles in the higher citation brackets. But this correlation can be explained in two ways. One is QB, which is that authors are more likely to make higher-quality articles OA. But it is also at least as plausible that higher-quality articles benefit more from OA! It is already known that the top c. 10-20% of articles receive c. 80-90% of all citations (Seglen’s 1992 “skewness of science”). It stands to reason, then, that when all articles are made OA, it is the top 20% of articles that are most likely to be cited more: Not all OA articles benefit from OA equally, because not all articles are of equally citable quality.

Hence both QB and QA are likely to be causal components in the OA Advantage, and the only way to tease them apart and estimate their individual contributions is to control for the QB effect by imposing the OA instead of allowing it to be determined by self-selection. We (Gargouri, Hajjem, Gingras, Carr & Harnad, in prep.) are completing such a study now, comparing mandated and unmandated OA; and Davis et al 2008 have just published another study on randomized OA for 11 journals:

“In the first controlled trial of open access publishing where articles were randomly assigned to either open access or subscription-access status, we recently reported that no citation advantage could be attributed to access status (Davis, Lewenstein, Simon, Booth, & Connolly, 2008)”

This randomized OA study by Davis et al. was very welcome and timely, but it had originally been announced to cover a 4-year period, from 2007-2010, whereas it was instead prematurely published in 2008, after only one year. No OA advantage at all was observed in that 1-year interval, and this too agrees with the many existing studies on the OA Advantage, some based on far larger samples of journals, articles and fields: Most of those studies (none of them randomized) likewise detected no OA citation advantage at all in the first year: It is simply too early. In most fields, citations take longer than a year to be made, published, ISI-indexed and measured, and to make any further differentials (such as the OA Advantage) measurable. (This is evident in Davis’s present preprint too, where the OA advantage is barely visible in the first year (2007).)

The only way the absence of a significant OA advantage in a sample with randomized OA can be used to demonstrate that the OA Advantage is only or mostly just a self-selection bias (QB) is by also demonstrating the presence of a significant OA advantage in the same (or comparable) sample with nonrandomized (i.e., self-selected) OA.

But Davis et al. did not do this control comparison (Harnad 2008b). Finding no OA Advantage with randomized OA after one year merely confirms the (widely observed) finding that one year is usually too early to detect any OA Advantage; but it shows nothing whatsoever about self-selection QB.

“we examine the citation performance of author-choice open access”

It is quite useful and interesting to examine citations for OA and non-OA articles where the OA is provided through (self-selected) “Author-Choice” (i.e., authors paying the publisher to make the article OA on the publisher’s website).

Most prior studies of the OA citation Advantage, however, are based on free self-archiving by authors on their personal, institutional or central websites. In the bigger studies, a robot trawls the web using ISI bibliographic metadata to find which articles are freely available on the web (Hajjem et al. 2005).

Hence a natural (indeed essential) control test that has been omitted from Davis’s current author-choice study – a test very much like the control test omitted from the Davis et al randomized OA study – is to identify the articles in the same sample that were made OA through author self-archiving. If those articles are identified and counted, that not only provides an estimate of the relative uptake of author-choice OA vs OA self-archiving in the same sample interval, but it allows a comparison of their respective OA Advantages. More important, it corrects the estimate of an OA Advantage based on author-choice OA alone: For, as Davis has currently done the analysis, any OA Advantage from OA self-archiving in this sample would in fact reduce the estimate of the OA Advantage based on author-choice OA (mistakenly counting as non-OA the articles and citation-counts for self-archived OA articles)

“METHODS… The uptake of the open access author-choice programs for these [11] journals ranged from 5% to 22% over the dates analyzed”

Davis’s preprint does not seem to provide the data – either for individual journals or for the combined totals – on the percentage of author-choice OA (henceforth AC-OA) by year, nor on the relation between the proportion uptake of AC-OA and the size of the OA Advantage, by year.

As Davis has been careful to do multiple regression analyses on many of the article-variables that might correlate with citations and OA (article age, number of authors, number of references, etc.), it seems odd not to take into account the relation between the size of the AC-OA Advantage and the degree of uptake of AC-OA, by year. The other missing information is the corresponding data for self-archiving OA (henceforth SA-OA).

“[For] All of the journals… all articles roll into free access after an initial period [restricted to subscription access only for 12 months (8 journals), 6 months (2 journals) or 24 months (1 journal)]”

(This is important in relation to the Early Access (EA) Advantage, which is the biggest contributor to the OA Advantage in the two cited studies by Kurtz on Astronomy. Astronomy has free access to the postprints of all articles in all astronomy journals immediately upon publication. Hence Astronomy has scope for an OA Advantage only through an EA Advantage, arising from the early posting of preprints before publication. The size of the OA Advantage in other fields -- in which (unlike in Astronomy) access to the postprint is restricted to subscribers-only for 6, 12, or 24 months -- would then be the equivalent of an estimate of an “EA Advantage” for those potential users who lack subscription access – i.e., the Accessibility Advantage, AA.)

“Cumulative article citations were retrieved on June 1, 2008. The age of the articles ranged from 18 to 57 months”

Most of the 11 journals were sampled till December 2007. That would mean that the 2007 OA Advantage was based on even less than one year from publication.

“STATISTICAL ANALYSIS… Because citation distributions are known to be heavily skewed (Seglen, 1992) and because some of the articles were not yet cited in our dataset, we followed the common practice of adding one citation to every article and then taking the natural log”

(How well did that correct the skewness? If it still was not normal, then citations might have to be dichotomized as a 0/1 variable, comparing, by citation-bracket slices, (1) 0 citations vs 1 or more citations, (2) 0 or 1 vs more than 1, (3) 2 or fewer vs. more than 2, (4) 3 or fewer vs. more than 3… etc.)

“For each journal, we ran a reduced [2 predictor] model [article age and OA] and a full [7 predictor] regression model [age, OA; log no. of authors, references, pages; Review; US author]”

Both analyses are, of course, a good idea to do, but why was Journal Impact Factor (JIF) not tested as one of the predictor variables in the cross-journal analyses (Hajjem & Harnad 2007a)? Surely JIF, too, correlates with citations: Indeed, the Davis study assumes as much, as it later uses JIF as the multiplier factor in calculating the cost per extra citation for author-choice OA (see below).

Analyses by journal JIF citation-bracket, for example, can provide estimates of QA (Quality Advantage) if the OA Advantage is bigger in the higher journal citation-brackets. (Davis’s study is preoccupied with the self-selection QB bias, which it does not and cannot test, but it fails to test other candidate contributors to the OA Advantage that it can test.)

(An important and often overlooked logical point should also be noted about the correlates of citations and the direction of causation: The many predictor variables in the multiple regression equations predict not only the OA citation Advantage; they also predict citation counts themselves. It does not necessarily follow from the fact that, say, longer articles are more likely to be cited that article length is therefore an artifact that must be factored out of citation counts in order to get a more valid estimate of how accurately citations measure quality. One possibility is that length is indeed an artifact. But the other possibility is that length is a valid causal factor in quality! If length is indeed an artifact, then longer articles are being cited more just because they are longer, rather than because they are better, and this length bias needs to be subtracted out of citation counts as measures of quality. But if the extra length is a causal contributor to what makes the better articles better, then subtracting out the length effect simply serves to make citation counts a blunter, not a sharper instrument for measuring quality. The same reasoning applies to some of the other correlates of citation counts, as well as their relation to the OA citation Advantage. Systematically removing them all, even if they are not artifactual, systematically divests citation counts of their potential power to predict quality. This is another reason why citation counts need to be systematically validated against other evaluative measures [Harnad 2008a].)

“Because we may lack the statistical power to detect small significant differences for individual journals, we also analyze our data on an aggregate level”

It is a reasonable, valid strategy, to analyze across journals. Yet this study still persists in drawing individual-journal level conclusions, despite having indicated (correctly) that its sample may be too small to have the power to detect individual-journal level differences (see below).

(On the other hand, it is not clear whether all the OA/non-OA citation comparisons were always within-journal, within-year, as they ought to be; no data are presented for the percentage of OA articles per year, per journal. OA/non-OA comparisons must always be within-journal/year comparisons, to be sure to compare like with like.)

“The first model includes all 11 journals, and the second omits the Proceedings of the National Academy of Sciences (PNAS), considering that it contributed nearly one-third (32%) of all articles in our dataset”

Is this a justification for excluding PNAS? Not only was the analysis done with and without PNAS, but, unlike all the other journals, whose data were all included, for the entire time-span, PNAS data were only included from the first and last six months.

Why? PNAS is a very high impact factor journal, with highly cited articles. A study of PNAS alone, with its much bigger sample size, would be instructive in itself – and would almost certainly yield a bigger OA Advantage than the one derived from averaging across all 11 journals (and reducing the PNAS sample size, or excluding PNAS altogether).

There can be a QB difference between PNAS and non-PNAS articles (and authors), to be sure, because PNAS publishes articles of higher quality. But a within-PNAS year-by-year comparison of OA and non-OA that yielded a bigger OA Advantage than a within-journal OA/non-OA comparison for lower-quality journals would also reflect the contribution of QA. (With these data in hand, the author should not be so focused on confirming his hypotheses: take the opportunity to falsify them too!)

“we are able to control for variables that are well-known to predict future citations [but] we cannot control for the quality of an article”

This is correct. One cannot control for the quality of an article; but in comparing within a journal/year, one can compare the size of the OA Advantage for higher and lower impact journals; if the advantage is higher for higher-impact journals, that favors QA over QB.

One can also take target OA and non-OA articles (within each citation bracket), and match the title words of each target article with other articles (in the same journal/year):

If one examines N-citation OA articles and N-citation non-OA articles, are their title-word-matched (non-OA) control articles equally likely to have N or more citations? Or are the word-matched control articles for N-citation OA articles less likely to have N or more citations than the controls for N-citation non-OA articles (which would imply that the OA has raised the OA article’s citation bracket)? And would this effect be greater in the higher citation brackets than in the lower ones (N = 1 to N = >16)?

If one is resourceful, there are ways to control, or at least triangulate on quality indirectly.

“spending a fee to make one’s article freely available from a publisher’s website may indicate there is something qualitatively different [about that article]”

Yes, but one could probably tell a Just-So story either way about the direction of that difference: paying for OA because one thinks one's article is better, or paying for OA because one thinks one's article worse! Moreover, this is AC-OA, which costs money; the stakes are different with SA-OA, which only costs a few keystrokes. But this analysis omitted to identify or measure SA-OA.

“RESULTS…The difference in citations between open access and subscription-based articles is small and non-significant for the majority of the journals under investigation”

(1) Compare the above with what is stated earlier: “Because we may lack the statistical power to detect small significant differences for individual journals, we also analyze our data on an aggregate level.”

(2) Davis found an OA Advantage across the entire sample of 11 journals, whereas the individual journal samples were too small. Why state this as if it were some sort of an empirical effect?

“where only time and open access status are the model predictors, five of the eleven journals show positive and significant open access effects.”

(That does not sound too bad, considering that the individual journal samples were small and hence lacked the statistical power to detect small significant differences, and that the PNAS sample was made deliberately small!)

“Analyzing all journals together, we report a small but significant increase in article citations of 21%.”

Whether that OA Advantage is small or big remains to be seen. The bigger published OA Advantages have been reported on the basis of bigger samples.

“Much of this citation increase can be explained by the influence of one journal, PNAS. When this journal is removed from the analysis, the citation difference reduces to 14%.”

This reasoning can appeal only if one has a confirmation bias: PNAS is also the journal with the biggest sample (of which only a fraction was used); and it is also the highest impact journal of the 11 sampled, hence the most likely to show benefits from a Quality Advantage (QA) that generates more citations for higher citation-bracket articles. If the objective had not been to demonstrate that there is little or no OA Advantage (and that what little there is is just due to QB), PNAS would have been analyzed more closely and fully, rather than being minimized and excluded.

“When other explanatory predictors of citations (number of authors, pages, section, etc.) are included in the full model, only two of the eleven journals show positive and significant open access effects. Analyzing all journals together, we estimate a 17% citation advantage, which reduces to 11% if we exclude PNAS.”

In other words partialling out 5 more correlated variables from this sample reduces the residual OA Advantage by 4%. And excluding the biggest, highest-quality journal’s data, reduces it still further.

If there were not this strong confirmation bent on the author’s part, the data would be treated in a rather different way: The fact that a journal with a bigger sample enhances the OA Advantage would be treated as a plus rather than a minus, suggesting that still bigger samples might have the power to detect still bigger OA Advantages. And the fact that PNAS is a higher quality journal would also be the basis for looking more closely at the role of the Quality Advantage (QA). (With less of a confirmation bent, OA Self-archiving, too, would have been controlled for, instead of being credited to non-OA.)

Instead, the awkward persistence of a significant OA Advantage even after partialling out the effects of so many correlated variables, despite restricting the size of the PNAS sample, and even after removing PNAS entirely from the analysis, has to be further explained away:

“The modest citation advantage for author-choice open access articles also appears to weaken over time. Figure 1 plots the predicted number of citations for the average article in our dataset. This difference is most pronounced for articles published in 2004 (a 32% advantage), and decreases by about 7% per year (Supplementary Table S2) until 2007 where we estimate only an 11% citation advantage.”

(The methodology is not clearly described. We are not shown the percent OA per journal per year, nor what the dates of the citing articles were, for each cited-article year. What is certain is that a 1-year-old 2007 article differs from a 4-year-old 2004 article not just in its total cumulative citations in June 2008, but in that the estimate of its citations per year is based on a much smaller sample, again reducing the power of the statistic: This analysis is not based on 2005 citations to 2004 articles, plus 2006 citations to 2005 articles, plus 2007 citations to 2006 articles, etc. It is based on cumulative 2004-2008 citations to 2004, 2005, 2006 etc. articles, reckoned in June 2008. 2007 articles are not only younger: they are also more recent. Hence it is not clear what the Age/OA interaction in Table S2 really means: Has (1) the OA advantage for articles really been shrinking across those 4 years, or are citation rates for younger articles simply noisier, because based on smaller citation spans, hence (2) the OA Advantage grows more detectable as articles get older?)

From what is described and depicted in Figure 1, the natural interpretation of the Age/OA interaction seems to be the latter: As we move from one-year-old articles (2007) toward four-year-old articles, three things are happening: non-OA citations are growing with time, OA citations are growing with time, and the OA/non-OA Advantage is emerging with time.

“[To] calculate… the estimated cost per citation [$400 - $9000]… we multiply the open access citation advantage for each journal (a multiplicative effect) by the impact factor of the journal… Considering [the] strong evidence of a decline of the citation advantage over time, the cost…would be much higher…”

Although these costs are probably overestimated (because the OA Advantage is underestimated, and there is no decline but rather an increase) the thrust of these figures is reasonable: It is not worth paying for AC-OA for the sake of the AC-OA Advantage: It makes far more sense to get the OA Advantage for free, through OA Self-Archiving.

Note, however, that the potentially informative journal impact factor (JIF) was omitted from the full-model multiple regression equation across journals (#6). It should be tested. So should the percentage OA for each journal/year. And after that the analysis should be redone separately for, say, the four successive JIF quartiles. If adding the JIF to the equation reduces the OA Advantage further, whereas without JIF the OA Advantage increases in each successive quartile, then that implies that a big factor in the OA Advantage is the Quality Advantage (QA).

“that we were able to explain some of the citation advantage by controlling for differences in article characteristics… strengthens the evidence that self-selection – not access – is the explanation for the citation advantage… more citable articles have a higher probability of being made freely accessible”

Self-selection (QB) is undoubtedly one of the factors in the OA Advantage, but this analysis has not estimated the size of its contribution, relative to many other factors (AA, CA, DA, EA, QA). It has simply shown that some of the same factors that influence citation counts, influence the OA citation Advantage too.

By failing to test and control for the Quality Advantage in particular (by not testing JIFs in the full regression equation, by not taking percentage OA per journal/year into account, by restricting the sample-size for the highest impact, largest-sample journal, PNAS, by overlooking OA self-archiving and crediting it to non-OA, by not testing citation-brackets of JIF quartiles), the article needlessly misses the opportunity to analyze the factors contributing to the OA Advantage far more rigorously.

“earlier studies [on the OA Advantage] may be showing an early-adopter effect…”

This is probably true. And early adopters also have a Competitive Advantage (CA). But with only about 20% OA being provided overall today, the CA is still there, except if it can be demonstrated – as Davis certainly has not demonstrated – that the c. 20% of articles that are being made OA today correspond sufficiently closely to that top 20% of articles that receive 80% of all citations. (Then the OA Advantage would indeed be largely QB.)

“authors who deposited their manuscripts in the arXiv tended to be more highly-cited than those who did not”

There is some circularity in this, but it is correct to say that this correlation is compatible with both QB and QA, and probably both are contributing factors. But none of the prior studies nor this one actually estimate their relative contributions (nor those of AA, CA, DA and EA).

“any relative citation advantage that was enjoyed by early adopters would disappear over time”

It is not that CA (Competitive Advantage) disappears simply because time elapses: CA only disappears if the competitors provide OA too! The same is true of QB (Quality Bias), which also disappears once everyone is providing OA. But at 20%, we are nowhere near 100% OA yet, hence there is plenty of scope for a competitive edge.

“If a citation advantage is the key motivation of authors to pay open access fees, then the cost/benefit of this decision can be quite expensive for some journals.”

This is certainly true, and would be true even if the OA citation Advantage were astronomically big – but the reason it is true is that authors need not pay AC-OA fees for OA at all: they can self-archive for free (and indeed are being increasingly mandated by their funders and institutions to do so).

“Randomized controlled trials provide a more rigorous methodology for measuring the effect of access independently of other confounding effects (Davis et al., 2008)… the differences we report in our study… have more likely explained the effect of self-selection (or self-promotion) than of open access per se.”

The syntax here makes it a little difficult to interpret, but if what is meant is that Davis et al’s prior study has shown that the OA Advantage found in the present study was more likely to be a result of QB than of QA, AA, CA, DA, or EA, then it has to be replied that that prior study showed nothing of the sort (Harnad 2008b). All it showed was that one cannot detect a significant OA Advantage at all one year after publication when OA is randomized. (The same is true when OA is not randomized.)

However, the prior Davis et al. study did find a significant DA (Download Advantage) for OA articles in the first year. And other studies have reported a significant correlation between early downloads and later citations (Brody et al. 2006).

So the prior Davis et al. study (1) confirmed the familiar failure to detect the OA Advantage in the first year, and (2) found a significant DA in the first year (probably predictive of a later OA citation Advantage). The present Davis study found (i) a significant OA Advantage, (ii) smallest in the first year (2007), much bigger by the fourth (2004).

“Retrospective analysis… our analysis is based on cumulative citations to articles taken at one point in time. Had we tracked the performance of our articles over time – a prospective approach – we would have stronger evidence to bolster our claim that the citation advantage is in decline. Still, we feel that cumulative citation data provides us with adequate inference.”

Actually, it would be possible, with a fuller analysis using the ISI database, to calculate not only the citation counts for each cited article, but the dates of the citing articles. So a “prospective” analysis can be done in retrospect. Without performing that analysis, however, the present study does not provide evidence of a decline in the OA Advantage with time, just evidence of an improved signal/noise ratio for measuring the OA Advantage with time. A “prospective” analysis, taking citing dates as well as cited dates into account, would be welcome (and is far more likely to show that the size of the OA Advantage is, if anything, growing, rather than confirming the author’s interpretation, unwarranted on the present data, that it is shrinking).

“all of the journals under investigation make their articles freely available after an initial period of time [hence] any [OA Advantage] would be during these initial months in which there exists an access differential between open access and subscription-access articles. We would expect therefore that the effect of open access would therefore be strongest in the earlier years of the life of the article and decline over time. In other words, we would expect our trend (Figure 1) to operate in the reverse direction.”

The reasoning here is a bit hard to follow, but the Kurtz studies that Davis cites show that in Astronomy, making preprints OA in the year or so before publication (after which all Astronomy postprints are OA) results in both “a strong EA effect and a strong [QB] effect.” But even in a fast-moving field like Astronomy, the effect is not immediate! There is no way to predict from the data for Astronomy how quickly an EA effect for nonsubscribers during the embargo year in Biomedicine should make itself felt in citations, but it is a safe bet that, as with citation latency itself, and the latency of the OA citation Advantage, the “EmA” (“Embargo Access”) counterpart of the EA effect in access-embargoed Biomedical journals will need a latency of a few years to become detectable. And since Davis’s age/OA interaction, based on static, cumulative, retrospective data, is just as readily interpretable as indicating that OA Advantages require time and sample-size growth in order to occur and be detected, the two patterns are perfectly compatible.

“we are at a loss to come up with alternative explanations to explain the monotonic decline in the citation advantage”

There is no monotonic decline to explain. Just (a) low power in initial years, (b) cumulative data not analysed to equate citing/cited year spans, (c) the failure to test for QA citation-bracket effects, and (d) the failure to reckon self-archiving OA into the OA Advantage (treating it instead as non-OA).

If this had been a JASIST referee report, I would have recommended performing several further analyses taking into account:

(1) self-archiving OA
(2) percentage OA per journal per year
(3) JIFs and citation-brackets
(4) the full PNAS dataset
(5) citing-article-date vs cited-article-date

and making the interpretation of the resultant findings more even-handed, rather than slanting toward the author’s preferred hypothesis that the OA Advantage is due solely or mostly to QB.

Full Disclosure: I am an OA advocate. And although I hope that I do not have a selective confirmation bias favoring QA, AA, CA, DA & EA, and against the Quality Bias (QB), I do think it is particularly important to ensure that QB is not given more weight than it has been empirically demonstrated to be able to bear.
    Davis writes:
"Free dissemination of the scientific literature may [1] speed up the transfer of knowledge to industry, [2] enable scientists in poor and developing countries to access more information, and [3] empower the general public. There are clearly many benefits to making one’s research findings freely available to the general public – a citation advantage may not be one of them."
It may not be; but if it is not, then it must be demonstrated rigorously that it is not, both because of the prima facie evidence from download and citations, and because of what is at stake:
    Currently, most researchers (c. 85%) are not yet providing OA to their research (Björk et al. 2008). Benefits [1]-[3] have clearly proven insufficient to motivate researchers to provide OA: only a small proportion of research articles have industrial applications [1] or general-public appeal [3], and most researchers are alas not motivated by charity to poor and developing countries [2]). Hence motivation to provide OA depends on other expected benefits.
    Kurtz has suggested that QB itself may provide the motivation: seeing that the better research is being made OA by the better researchers should motivate all researchers to do likewise.
    But on the assumption that the better researchers are making their better research OA for a reason, rather than just [1]-[3] (or just to imitate one another), it seems reasonable to ask what benefits their motivation is based on. And the enhancement of the uptake, usage, application and impact of their own research seems to be the obvious candidate. Otherwise, providing OA just for its own sake seems an irrational practice, especially on the part of the better researchers, for the better research.
    A rigorous estimate of the true causal contribution of QB to the observed OA Advantage is also important for research policy reasons:
    Although they have been cited in support of OA mandates by institutions and funders, industrial applications [1], charity [2] and public appeal [3] are not in themselves sufficient to motivate the adoption of such mandates, or compliance with them. Research impact itself would seem to be the natural rationale for maximizing research access.
    So if the observed OA Advantage really does not provide the benefits that would motivate researchers to provide OA, this has to be demonstrated directly and rigorously. Why would the better researchers make their better research OA if it conferred no benefits (other than [1]-[3])? Too much is at stake to accept QB as the default hypothesis. The natural default hypothesis on the basis of the prima facie evidence (more downloads and citations) is that increasing the access to research increases the usage and impact of research. The higher download and citation counts correlated with OA support this. The burden of proof hence falls on providing evidence to the contrary.

References

Björk, Bo-Christer; Roosr, Annikki; Lauri, Mari (2008) Global annual volume of peer reviewed scholarly articles and the share available via different open access options. In: Chan, L. & Mornati, S. (Eds) ELPUB2008. Open Scholarship: Authority, Community, and Sustainability in the Age of Web 2.0 - Proceedings of the 12th International Conference on Electronic Publishing Toronto, Canada 25-27 June 2008: pp. 178-186

Brody, T., Harnad, S. and Carr, L. (2006) Earlier Web Usage Statistics as Predictors of Later Citation Impact. Journal of the American Association for Information Science and Technology (JASIST) 57(8): 1060-1072

Craig, I. D., Plume, A. M., McVeigh, M. E., Pringle, J., & Amin, M. (2007). Do Open Access Articles Have Greater Citation Impact? A critical review of the literature. Journal of Informetrics 1(3): 239-248

Davis, P.M. (2008) Author-choice open access publishing in the biological and medical literature: a citation analysis. Journal of the American Society for Information Science and Technology (JASIST) (in press)

Davis, P. M., & Fromerth, M. J. (2007). Does the arXiv lead to higher citations and reduced publisher downloads for mathematics articles? Scientometrics 71(2): 203-215.

Davis, P. M., Lewenstein, B. V., Simon, D. H., Booth, J. G., & Connolly, M. J. L. (2008). Open access publishing, article downloads and citations: randomised trial. British Medical Journal 337: a586

Hajjem, C. and Harnad, S. (2007a) Citation Advantage For OA Self-Archiving Is Independent of Journal Impact Factor, Article Age, and Number of Co-Authors. Technical Report, Electronics and Computer Science, University of Southampton.

Hajjem, C. and Harnad, S. (2007b) The Open Access Citation Advantage: Quality Advantage Or Quality Bias? Technical Report, Electronics and Computer Science, University of Southampton

Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4) 39-47.

Harnad, S. (2007a) Craig et al.'s Review of Studies on the OA Citation Advantage. Open Access Archivangelism 248.

Harnad, S. (2007b) Where There's No Access Problem There's No Open Access Advantage Open Access Archivangelism 389

Harnad, S. (2008a) Validating Research Performance Metrics Against Peer Rankings. Ethics in Science and Environmental Politics 8 (11)

Harnad, S. (2008b) Davis et al's 1-year Study of Self-Selection Bias: No Self-Archiving Control, No OA Effect, No Conclusion British Medical Journal: Rapid Responses 337 (a568): 199775

Hitchcock, S. (2008) The effect of open access and downloads ('hits') on citation impact: a bibliography of studies

Kurtz, M. J., Eichhorn, G., Accomazzi, A., Grant, C., Demleitner, M., Henneken, E., et al. (2005). The effect of use and access on citations. Information Processing and Management 41: 1395-1402

Kurtz, M. J., & Henneken, E. A. (2007). Open Access does not increase citations for research articles from The Astrophysical Journal: Harvard-Smithsonian Center for Astrophysics

Lawrence, S. (2001) Free online availability substantially increases a paper's impactNature 31 May 2001

Moed, H. F. (2007). The effect of 'Open Access' upon citation impact: An analysis of ArXiv's Condensed Matter Section. Journal of the American Society for Information Science and Technology 58(13): 2047-2054

Seglen, P. O. (1992). The Skewness of Science. Journal of the American Society for Information Science 43(9): 628-638

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in OA Impact Advantage at 02:04 | Comments (0) | Trackbacks (0)

« previous page (Page 70 of 113, totaling 1129 entries) » next page

Open Access Archivangelism

by Stevan Harnad

Friday, September 12. 2008

Too Much Ado About PDF

Wednesday, September 10. 2008

Australian innovation report recommends Open Access to research outputs

Joseph Esposito's "Almost-OA": "Almost Pregnant"

Tuesday, September 9. 2008

Open Access and Research Conference 2008: Brisbane 24-25 September

Research Evaluation, Metrics and Open Access in the Humanities: Dublin 18-20 September

Monday, September 8. 2008

OA Needs Open Evidence, Not Anonymous Innuendo

Thursday, September 4. 2008

Offloading Cognition onto Cognitive Technology

Wednesday, September 3. 2008

SHERPA/RoMEO: Publishers with Paid Options for Open Access

Friday, August 29. 2008

On Eggs and Citations

Monday, August 25. 2008

Confirmation Bias and the Open Access Advantage: Some Methodological Suggestions for Davis's Citation Study

EnablingOpenScholarship (EOS)

Federal Research Public Access Act (FRPAA)

Alliance for Taxpayer Access (ATA)

Creative Commons License:

Quicksearch

Syndicate This Blog

Materials You Are Invited To Use To Promote OA Self-Archiving:

Archives

Calendar

Categories

Blog Administration

Statistics

Top Referrers

Syndicate This Blog