Open Access Archivangelism

Friday, June 16. 2006

Metrics-Based Assessment of Published, Peer-Reviewed Research

SUMMARY: Larry Hurtado (Divinity, Edinburgh) suggests that a metrics-based alternative to the panel-based UK Research Assessment Exercise (RAE) may not be appropriate for some disciplines because their research is not journal-article-based, and hence citation-impact metrics are not valid. I reply that all disciplines have research output, and its usage and impact can be objectively measured. For example, book citation-impact can be measured too, if authors deposit their books' reference lists in their institutional repositories. Moreover, the RAE is just one of the two components of the UK dual research funding system. The other, far larger component, continues to be individual research proposal funding, adjudicated by peer review. Metrics are merely a supplement to -- not a substitute for -- peer review, but published articles and books have already been peer reviewed, hence there is no need for RAE panels to try to repeat the exercise.

On Wed, 14 Jun 2006, Larry Hurtado, Department of Divinity, University of Edinburgh, wrote in the American Scientist Open Access Forum:

LH: "Stevan Harnad is totally in favour of a "metrics based" approach to judging research merit with a view toward funding decisions, and greets the news of such a shift from past/present RAE procedure with unalloyed joy."

No, metrics are definitely not meant to serve as the basis for all or most research funding decisions: research proposals, as noted, are assessed by peer review.

Metrics is intended for the other component in the UK dual funding system, in which, in addition to directly funded research, based on competitive peer review of research bids, there is also a smaller, secondary (but prestigious) top-slicing system, the Research Assessment Exercise (RAE). It is the RAE that needed to be converted to metrics from the absurd, wasteful and costly juggernaut that it used to be.

LH: "Well, hmmm. I'm not so sure (at least not yet). Perhaps there is more immediate reason for such joy in those disciplines that already rely heavily on a metrics approach to making decisions about researchers."

No discipline uses metrics systematically yet; moreover, many metrics are still to be designed and tested. However, the only thing "metrics" really means is: the objective measurement of quantifiable performance indicators. Surely all disciplines have measurable performance indicators. Surely it is not true of any discipline that the only way, or the best way, to assess all of its annual research output is by having each piece individually re-reviewed after it has already been peer-reviewed twice -- before execution, by a funding council's peer-reviewers as a research proposal, and after execution, by a journal's referees as a research publication.

LH: "In the sciences, and also now social sciences, there are citation-services that count publications and citations thereof in a given list of journals deemed the "canon"of publication venues for a given discipline. And in these disciplines journal articles are deemed the main (perhaps sole) mode of research publication. Ok. Maybe it'll work for these chaps."

First, with an Open Access database, there need be no separate "canon": articles in any of the world's 24,000 peer-reviewed journals and congresses can count -- though some will (rightly) count for more than others, based on the established and known quality standards and impact of the journal in which it appeared (this too can be given a metric weight). Alongside the weighted impact factor of the journal, there will be the citation counts for each article itself, its author, the co-citations in and out, the download counts, the hub/authority weights, the endogamy/exogamy weights. etc. etc.

All these metrics (and many more) will be derivable for all disciplines from an Open Access database (no longer just restricted to ISI's Web of Knowledge).

That includes, by the way, citations of books by journal articles -- and also citations of books and journal articles by books, because although most book authors may not wish to make their books' full-texts OA, they can and should certainly make their books' bibliographic metadata, including their bibliography of cited references, OA. Those book-impact metrics can then be added to the metric harvest, citation-linked, counted, and duly weighted, along with all the other metrics.

There are even Closed-Access ways of self-archiving books' digital full-texts (such as google book search) so they can be processed for semiometric analysis (endogamy/exogamy, content overlap, proximity, lineage, chronometric trends) by harvesters that do not make the full text available openly. All disciplines can benefit from this.

LH: "But I'd like to know how it will work in Humanities fields such as mine. Some questions, for Stevan or whomever. First, to my knowledge, there is no such citation-count service in place. So, will the govt now fund one to be set up for us? Or how will the metrics be compiled for us? I.e., there simply is no mechanism in place for doing "metrics"for Humanities disciplines."

All the government needs to do is to mandate the self-archiving of all UK research output in each researcher's own OAI-compliant institutional (or central) repository. (The US and the rest of Europe will shortly follow suit, once the prototype policy model is at long last adopted by a major player!) The resulting worldwide interoperable database will be the source of all the metric data, and a new generation of scientometric and semiometric harvesters and analysers will quickly be spawned to operate on it, to mine it to extract the rich new generation of metrics.

There is absolutely nothing exceptional about the humanities (as long as book bibliographies are self-archived too, alongside journal-article full-texts). Research uptake and usage is a generic indicator of research performance, and citations and downloads are generic indicators of research uptake and usage. The humanities are no different in this regard. Moreover, inasmuch as OA also enhances research uptake and usage itself, the humanities stand to benefit from OA, exactly like the other disciplines.

LH: "Second, for us, journal articles are only one, and usually not deemed the primary/preferred, mode of research publication. Books still count quite heavily. So, if we want to count citations, will some to-be-imagined citation-counting service/agency comb through all the books in my field as well as the journal articles to count how many of my publications get cited and how often? If not, then the "metrics"will be so heavily flawed as to be completing misleading and useless."

All you need to do is self-archive your books' metadata and cited reference lists and all your journal articles in your OAI-compliant Institutional repository. The scientometric search engines -- like citebase, citeseer, google scholar, and more to come -- will take care of all the rest. If you want to do even better, scan in, OCR and self-archive the legacy literature too (the journal articles plus the metadata and cited reference lists of books of yore too; if you're worried about variations in reference citing styles: don't worry! Just get the digital texts in and algorithms can start sorting them out and improving themselves).

LH: "Third, in many sciences, esp. natural and medical sciences, research simply can't be conducted without significant external funding. But in many/most Humanities disciplines truly groundbreaking and highly influential research continues to be done without much external funding."

So what is your point? That the authors of unfunded research, uncoerced by any self-archiving mandate, will not self-archive? Don't worry. They will. They may not be the first ones, but they will follow soon afterwards, as the power and potential of self-archiving to measure as well as to accelerate and increase research impact and progress become more and more manifest.

LH: "(Moreover, no govt has yet seen fit to provide funding for the Humanities constituency of researchers commensurate with that available for Sciences. So, it's a good thing we don't have to depend on such funding!)"

Funding grumbles are a worthy topic, but they have nothing whatsoever to do with OA and the benefits of self-archiving, or metrics.

LH: "My point is that the "metrics"for the Humanities will have to be quite a bit different in what is counted, at the very least."

No doubt. And the metrics used, and their weights, will be adjusted accordingly. But metrics they will be. No exceptions there. And no regression back to either human re-evaluation or delphic oracles: Objective, countable performance indicators (for the bulk research output: of course for special prizes and honours individual human judgment will have to be re-invoked, in order to compare like with like, individually).

LH: "Fourth, I'm not convinced (again, not yet; but I'm open to persuasion) that counting things = research quality and impact. Example: A number of years ago, coming from a tenure meeting at my previous University I ran into a colleague in Sociology. He opined that it was unnecessary to labour over tenure, and that he needed only two pieces of information: number of publications and number of citations. I responded, "I have two words for you: Pons and Fleischman". Remember these guys? They were cited in Time and Newsweek and everywhere else for a season as discovers of "cold fusion". And over the next couple of years, as some 50 or so labs tried unsuccessfully to replicate their alleged results, they must have been among the most frequently-cited guys in the business. And the net effect of all that citation was to discredit their work. So, citation = "impact". Well, maybe, but in this case "impact"= negative impact. So, are we really so sure of "metrics"?"

Not only do citations have to be weighted, as they can and will be, recursively, by the weight of their source (Proceedings of the Royal Society vs. The Daily Sun, citations from Nobel Laureates vs citations from uncited authors), but semiometric algorithms will even begin to have a go at sorting positive citations from negative ones, disinterested ones from endogamous ones, etc. Are you proposing to defer to individual expert opinion in some (many? most? all?) cases, rather than using a growing wealth and diversity of objective performance indicators? Do you really think it is harder to find individual cases of subjective opinion going wrong than objective metrics going wrong?

LH: "Perhaps, however, Stevan can help me see the light, and join him in acclaiming the advent of metrics."

I suggest that the best way to see the light on the subjective of Open Access Digitometrics is to start self-archiving and sampling the (few) existing digitometric engines, such as citebase. You might also wish to have a look at the chapter I recommended (no need to buy the book: it's OA: Just click!):

Shadbolt, N., Brody, T., Carr, L. and Harnad, S. (2006) The Open Research Web: A Preview of the Optimal and the Inevitable, in Jacobs, N., Eds. Open Access: Key Strategic, Technical and Economic Aspects, chapter 21. Chandos.

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Research Assessment at 05:31 | Comments (0) | Trackbacks (0)

Thursday, June 15. 2006

FRPAA and paying publishers to self-archive

SUMMARY: Some publishers have suggested that because a 6-month embargo on Open Access self-archiving by authors is too long for researchers and too short for publishers, the FRPAA should instead pay publishers to provide the Open Access immediately. This is fine if the research funders have the extra cash to pay whatever price publishers are currently charging for this (it varies from under $500 to over $3000 today) or to impose a standardized cap on the price and pay that. But otherwise it makes more sense for authors to self-archive for themselves, at no cost, now, exactly as proposed by the FRPAA, and to allow the market to decide the price, if and when subscription revenues should ever prove unsustainable. There is no evidence at all of subscription revenue decline yet, as a consequence of self-archiving, even after 15 years in the fields where self-archiving has been practised the longest and effectively reached 100% years ago. The FRPAA should mandate that the deposit of all articles must be immediate (upon acceptance for publication), with only the Open-Access-setting (vs. Closed Access) open to embargo (capped at 6 months) from the 6% of journals that do not yet endorse immediate Open-Access-setting. Semi-automatized email-eprint requests made possible by the institutional repository softwares can provide for the needs of the researchers during the embargo period for articles in those journals.

(From Peter Suber's Open Access News)
Springer's unexpected response to FRPAA

Peter Suber: "I've learned --and Jan Velterop has confirmed-- that Springer has sent a letter to Sen. Susan Collins, chair of the Senate committee considering FRPAA, raising an unusual objection to the six-month embargo allowed by the bill. The letter argues that six months is too short to satisfy publishers and too long to satisfy researchers. In its place, Springer proposes a policy that would require full-text open access immediately upon publication --provided that the policy makes clear that publishing in peer-reviewed journals is an inseparable part of research and therefore that the funds for doing so (article processing fees) will be available to researchers as a special overhead on their publicly-funded research grants. The letter proposes that the new policy might be phased in after a short grace period to give publishers a chance to modify their business models."

The Federal Research Public Access Act (FRPAA) proposes to mandate that all federally funded researchers must not only publish their research findings in journals (as they already must), but they must now also make all the peer-reviewed journal articles in which those findings are reported openly accessible (OA) to all the potential users of those findings -- by self-archiving them free for all on the web (within at most 6 months of publication).

A publisher (Springer) has now recommended to the sponsors of the FRPAA that because a 6-month embargo on self-archiving is too long for researchers and too short for publishers, the FRPAA should instead mandate immediate self-archiving and pay the publishers to do it in place of the authors. The recommendation does not mention the amount that the publishers should be paid, but currently publishers are charging between $500 and $3000 or more for making articles OA (Springer charges $3000).

I would like to make some comments on this suggestion. Please note that they contain some nested contingencies:

(1) If the federal funding agencies have the extra cash, and are willing to pay publishers whatever amount they ask today (or to impose a capped amount of their own), and if the FRPAA can be successfully passed as an immediate-OA mandate in this way (i.e., no embargo allowed), this would be a perfectly fine outcome -- acceptable to research and researchers as well as to publishers.

(2) If, however, the federal funding agencies do not have the extra cash to pay publishers the amount they ask today (or an acceptable capped amount), and/or if the FRPAA cannot be successfully enacted into law if burdened with a commitment to pay publishers the amount they ask today (or an acceptable capped amount) for OA, then the suggestion that FRPAA should be revised to do so is just another way to delay or doom the passage of the FRPAA, and should be ignored.

(3) The present version of the FRPAA does not propose to pay anyone anything: it merely mandates that federally funded research must be made OA by the fundee, by self-archiving it, within (at most) 6 months of publication, in the fundee's own institutional repository (or a central one).

(4) To date there is no evidence at all that self-archiving reduces publisher subscription revenues; and the two publishers whose authors have been self-archiving the longest and the most, the American Physical Society and the Institute of Physics, have both reported publicly that they have (4a) no detectable subscription declines and are (4b) unopposed to an immediate (no-embargo) OA self-archiving mandate.

(5) The objective, empirical way to test whether or not there is any truth to the hypothesis of some other publishers that self-archiving will reduce subscription revenue -- and the only way to find out how much and how fast it would reduce subscription revenue if ever it did so at all -- is to adopt the FRPAA mandate now (both it and variants of it have been debated, delayed and deferred for 3 years now, with no new information or evidence forthcoming) and then to monitor its outcome annually, making further adjustments only as and when there is evidence that they are necessary.

(6) It is quite true that a 6-month embargo is bad for research, which is conducted in order be immediately used and applied once it has passed peer review, and that in many rapidly moving fields the very earliest "growth tip" of research is the most important of all. But if an immediate no-embargo OA mandate cannot yet be agreed upon, an interim way to minimize that damage to research is to require immediate deposit and to allow only the date at which access to the deposited full text is set to Open Access (OA) to be delayed (for no more than 6 months) where necessary (Closed Access until then).

(7) 94% of journals already endorse setting access immediately to OA.

(8) For the remaining 6% of articles set to Closed Access, the article's bibliographic metadata will still be visible and accessible to all immediately, and the self-archiving repository software provides a semi-automatic feature for individual would-be users to request -- and authors to provide -- an individual eprint of the full text by email, almost instantly.

(9) This immediate-deposit/delayed-OA-setting compromise is the preferable one if the federal funding agencies do not have the extra cash, or are unwilling to pay publishers whatever amount they ask today (or to impose a capped amount of their own) to provide instant OA.

(10) At the moment, institutional subscriptions are paying the costs of peer review. If/when subscription revenues were indeed ever to decline to unsustainable levels because of institutional cancellations, the institutional windfall savings from the cancellations would themselves be a natural candidate source for covering the peer-review costs for the institution's own research output, rather than any arbitrary amount requested from federal research funders today -- especially as subscription decline would first generate pressure toward publisher cost-cutting, downsizing and readjustment to the new reality of OA publishing, and hence a more realistic, market-driven figure for the true costs of peer review (which publishers merely manage, whereas researchers themselves actually perform the refereeing itself for free).

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Self-Archiving Mandates at 16:19 | Comments (0) | Trackbacks (0)

Tuesday, June 13. 2006

"CURES" trump publisher revenue risks: Public READS do not

SUMMARY: The publisher lobby can defeat the Federal Research Public Access Act (FRPAA) (to mandate self-archiving of all federally funded research so as to make it freely accessible online) if FRPAA is promoted merely or mainly as a means of providing public (student, practitioner, patient, general public) access to publicly funded research. Publishers can and will argue (and already are arguing) that the public does not really want or need access to most of this specialized peer-reviewed journal literature, across all fields (not just clinically relevant medicine), written by and for specialized researchers, and that it is hence not justified to put publishers' subscription revenues at potential risk by mandating that this entire literature must be made freely accessible online. Instead, publishers will propose special arrangements in which they themselves would make the tiny fraction of what they publish that is of potential public interest freely accessible online. The right response to this by FRPAA proponents is to make it very explicit that the primary purpose of the bill is not public READS, but "CURES" -- i.e., the public benefits that come from applying and building upon research findings in further research and practical applications, for the sake of which the research was publicly funded in the first place. And CURES come from researcher access and usage -- researchers applying and building upon current research in further research and applications -- not from public access and usage. Because no researcher can currently afford access to anywhere near all the research they might need to read and use, researcher self-archiving substantially accelerates and increases research usage and impact, which is the measure of speed and progress toward CURES. And substantially accelerating and increasing progress toward CURES -- unlike providing public READS -- does outweigh any hypothetical risk to publisher revenues (although there is as yet absolutely no evidence that self-archiving reduces subscription revenues.) Moreover, with free online access (Open Access) to self-archived research, the public (students, practitioners, patients, tax-payers) get full access too, as a secondary benefit, but not because that is the primary benefit from or justification for mandating Open Access self-archiving.

ANON: " Your arguments are totally logical. However, a factor you are not taking into account: if researchers are focused on their research- impact - politicians are focused on their own image and reelection potential. It is the politicians who need to vote in FRPAA."

And it is the publisher lobby that will be pressuring them not to. SPPP (Student/Practitioner/Patient/Public) access is a good intro, to get the politicians' and voters' attention, but then you need a follow-through that can hold up against the publisher lobby -- and SPPP-access has no follow-through when publishers inevitably say, as they will (and are already):

"You want to mandate that our business revenue should be put at risk for the sake of SPPP-access, yet there is no evidence that the SPPP reads (or has the slightest wish to read) most of the highly specialized research that we publish! Why not just make a side-deal that we make publicly accessible that tiny fraction of (mostly clinical-medical) research that is likely to be of SPPP interest, and leave the rest of it -- which is the overwhelming majority of it -- alone, rather than putting all of our revenues at risk for no objective reason?

(And denigrate logic all you like, in the end, the pro-mandate argument has to make sense, otherwise the publisher lobby wins and the OA self-archiving mandate -- and the best interests of research and the public that funds it -- lose.)

The requisite follow-through is CURES, not SPPP-access. Students, practitioners, patients and the public do not produce CURES, researchers do. And the reason researcher usage and impact is so important is not because it produces money and prizes for researchers, but because it generates CURES. In fact, that is what research is funded for, not to produce reading matter for the SPPP.

("CURES" is of course over-simplified too, and medically biassed, but it will do, as long as it is put in scare-quotes or CAPs: more generally, it means applications of research, including technology; even more generally, it means pure research progress itself, which might eventually lead to applications; and when it comes to social science and especially the humanities, which rarely has any applications at all, it means the production of specialized scholarship, which we presumably fund because we think it is a social benefit to promote scholarship, not because the general public or even students actually need or wish to read the peer-reviewed journal articles reporting the research the public funds, written by specialists for specialists, but because the public wants to promote scholarly progress, which may eventually trickle down into education.)

ANON: " Is there evidence that FRPAA will result in the kind of citations that politicians care about - photo ops and positive pieces in the news, funding support and votes so that they can be re-elected?"

CURES produce photo-ops, and for researchers to produce CURES, researchers (not SPPP) need to have access to the ongoing research, in order to use it and build on it.

Moreover, the politicians are not just responsive to votes, as you know, but also to money and lobbying, especially from big business, and to what fosters or threatens business revenue flows. Yes, "public access to publicly funded research" sounds like a good vote-getter on the surface, even if it doesn't amount to much research the public would actually want to access and. But the publishing lobby is another matter, and they are the ones to contend with now.

It's not the vote-getting power of the OA principle that has been blocking the RCUK policy for two years and that has watered down the ,a href="http://publicaccess.nih.gov/policy.htm">NIH public access policy to near-nothingness: it's the publisher lobby; and this time FRPAA has to come forearmed: If it tries to coast on the public-access-to-publicly-funded-research slogan alone, or primarily, it will be defeated, no matter how sexy it may sound as a vote-getter. (And, by the way, most individual citizens don't read research and couldn't care less about this issue, one way or the other.)

Publishers will float doomsday scenarios about ruinous risks to their ability to make ends meet if self-archiving is mandated (not based on any evidence, but sounding ominous just the same). These doomsday scenarios need a more convincing answer than that "we are doing it so the public can read the research it funds" -- because then the publishers will simply adduce the abundant evidence that the public is not reading most of the peer-reviewed research they publish, and would not and could not have the slightest interest in ever reading it. So the revenue-risk is completely unjustified.

Not so if the rationale is CURES rather than SPPP READS, for research progress and the possibility of cures is the very reason we fund research in the first place. CURES -- but not READS -- offset publishers' hypothetical doomsday scenario very effectively.

ANON: " To put it another way: is there research showing that politicians care about researcher-impact at all, never mind enough to stand up to the publisher anti-OA lobbying?"

Politicians care about CURES, and "cures" is the simple (simplistic) encapsulation of research uptake, usage, application, productivity and progress. And that in turn is something that can only come from researchers using and applying research, not from the public, reading research. And it is for CURES that the public is funding research in the first place, not for its own READing delectation.

So the right public issue politicians need to focus on is CURES, not SPPP-access; and CURES means research usage and impact, which comes from researcher-use, not from SPPP-reading.

ANON: " Arguments focussed on students, patients, and the public are much more likely to persuade politicians than arguments based exclusively on benefits for researchers. The two streams of arguments complement each other. It is not necessary, or desirable, to limit pro-OA arguments."

OA is not about benefits to researchers! It is about CURES. Researcher access means more progress and momentum toward CURES.

Moreover, it is now no longer just about persuading politicians but about resisting the publisher lobby, which is trying to dissuade politicians. Answers to their objections are needed too; and SPPP-access is not the answer, CURES is; and that means researcher-access, not SPPP-access. (Yet, let us not forget, SPPP-access can and will come too, with the OA territory: So it's fine to mention both benefits, but essential to make it clear that CURES is the primary rationale for mandating self-archiving, and READS merely a secondary benefit.

ANON: " The politician who cares about patients but thinks the researcher-arguments are abstract, will support a patient-based OA argument. It is unlikely that a person with this viewpoint would support a research-only focused argument."

The focus is on CURES, not on abstract researcher-arguments: Everyone knows that CURES come from researchers, not from students, practitioners, patients or the general public. I think that is a concrete matter that politicians and voters are quite capable of understanding. And it has the virtue of trumping the publishers' arguments about hypothetical revenue risks: progress toward actual CURES (monitored in the form of research impact) trumps hypothetical revenue risks; SPPP-READS do not.

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Self-Archiving Mandates at 03:57 | Comments (0) | Trackbacks (0)

Student/Practitioner/Patient/Public (SPPP) Access Comes With the OA Territory

SUMMARY: "Public Access to Publicly Funded Research" is a good way to draw voter and elected-official attention to the need for Open Access, but then it has to be made clear that research is publicly funded primarily for the sake of the benefits it brings to the public: so its findings can be used and built upon in the form of further research progress and applications ("CURES") and not primarily because the public has a great desire or need to read peer-reviewed research articles ("READS"), written by and for researchers (and research students). Hence it is primarily for the sake of making publicly funded research accessible to its intended users -- researchers -- that Open Access is so important and urgent for the public that funds it: because, right now, researchers do not have access to all or most of the research they need in order to maximize research progress and applications (CURES), to the benefit of the public that funded the research. Public access READs, however, come as a secondary side-benefit of Open Access anyway (for everyone, including students, practitioners and patients); yet the reason it is so important not to argue that public READs are the primary rationale for Open Access is that that rationale can be challenged (and perhaps even defeated) by the publisher lobby, arguing that it is unjustified to put their subscription revenues at potential risk by mandating author self-archiving (even though there exists no objective evidence that author self-archiving causes subscription decline) merely for the sake of public READS when the public (including students, practitioners and patients) in fact has little interest in reading peer-reviewed research journal articles apart from a minority of health-relevant articles, for which the publishers can instead make special arrangements to make them -- and only them -- Open Access. In contrast, the true primary rationale for Open Access, "CURES" (i.e., research progress and applications) for the public that funded the research, does over-ride hypothetical risks to publisher subscription revenue (for which no actual evidence even exists).

Below is a reply to an anonymized query on an often-confused issue concerning Open Access (OA), the rationale for providing OA, and the rationale for mandating the provision of OA (by mandating self-archiving, as the RCUK in the UK, the FRPAA in the US and the European Commission (EC) are each proposing to do):

(1) OA is about Open Access to research: about 2.5 million articles per year, published in about 24,000 peer-reviewed research journals and congress proceedings in all disciplines, from maths, physics and engineering to biology and medical sciences, to the social sciences and the humanities.

(2) The only ones who can provide access to these 2.5 million annual researchers articles are their authors, the researchers: either by publishing them in an OA journal, or by publishing them in a conventional journal and also self-archiving them in their institutional repository (or a central one).

(3) Author surveys have shown that although only about 15% of authors self-archive spontaneously, 95% will comply if mandated to self-archive by their research funders and/or their institutions. The half-dozen self-archiving mandates that have already been adopted (including those of the Wellcome Trust, CERN, and several universities) have since confirmed this high compliance rate.

(4) The FRPAA, RCUK and EU mandate proposals are facing stout opposition from the publisher lobby, even though 94% of journals have already given their green light for immediate author self-archiving.

(5) The two main bases for the publishers' objections are that OA is unnecessary and that mandating self-archiving would put their subscription revenues at risk.

(6) There is to date no evidence at all that self-archiving reduces publisher subscription revenues, but even if it were ever to do so, the question is whether the benefits (to research, researchers, and the public that funds them) outweigh the risks (to publishers).

(7) The chief evidence that the benefits (to research, researchers, and the public that funds them) outweigh the risks (to publishers) is that OA substantially accelerates and increases research usage and impact, hence research progress and productivity (as measured by download and citation counts): This is the primary motivation for mandating the provision of OA, through self-archiving: to maximize research usage and impact, hence research productivity and progress ("CURES").

(8) A side-benefit is that OA increases access and usage for practitioners, patients and the general public too -- in those fields in which there is practitioner, patient, and public (PPP) interest in the research articles. (It has to be noted, though, that there is as yet no systematic quantitative measure of this PPP interest and that PPP interest is almost certainly limited to only a small fraction of the annual 2.5 million articles published across all research fields.)

(9) Hence the primary rationale for mandating OA self-archiving has to be the objectively measurable and demonstrated benefits it provides for research, researchers, and the public that funds them -- in terms of usage and citations as measures of progress toward CURES -- rather than merely PPP interest.

(10) OA of course also provides access for PPP use, but the important strategic point to understand is that PPP-needs (being limited to only a fraction of the OA fields and being difficult as yet to measure and document) cannot be adduced as the primary reason for putting publisher subscription revenues at even hypothetical risk -- otherwise there is a strong actual risk that the proposals to mandate self-archiving will be successfully defeated by the publisher lobby. The rationale for OA is primarily to provide access to research by researchers for researchers -- all for the benefit of the public that funded the research. PPP access is merely a side-benefit of OA. It would come nowhere near justifying an OA mandate just on its own.

Here is my detailed reply to a well-meaning (anonymous) query concerning PPP interests:

ANON: " When I read your 8-point agenda I believe that the clinical faculty would feel that they were not being embraced in it."

I think you are not quite understanding the OA problem, hence its solution: The objective is to provide free online access (OA) for all would-be users (whether they be researchers or practitioners, patients and public [PPP]).

The problem, however, is that the providers of the research, namely, the researchers who wrote the research articles, are not yet providing OA to their articles spontaneously.

The solution is to mandate that they must provide OA, for the benefit of the public that funds their research -- by self-archiving their own final, refereed, accepted drafts of their own articles free online in institutional or central repositories.

In order to get that solution (mandate) adopted, it is necessary to persuade those who are in a position to mandate self-archiving -- namely the researchers' own funders and institutions -- to mandate it. In order to persuade them to mandate it, it is necessary to persuade them that there is a need to mandate OA -- especially because the publishers are trying to prevent self-archiving mandates, or, failing that, to embargo them, because they fear they could reduce their subscription revenues (even though there is no evidence of this, even after 15 years of self-archiving, some of it at or near 100% for years now in some subfields).

Now comes the critical point: To persuade researchers and their funders and employers that there is indeed a strong need to mandate self-archiving despite the publishers' objections that there is no need for OA and that it might put their subscription revenues at risk, you have to make it clear exactly what the need for OA is, and how and why it is to researchers' advantage to self-archive their research:

The chief need for OA is on the part of those who are in the position to use and apply the research, for the benefit of the public that funded it, namely, the researchers by and for whom the research articles were written. And the objective measure of their need is download and citation counts: It has been demonstrated that self-archiving accelerates and increases downloads and citations substantially (meaning that without it, many potential users are denied access). Citation counts mean salary and funding for researchers, and overheads for their institutions, and both citations and downloads mean a return on the funder's investment of tax-payer money in funding the research, in terms of research productivity, applications and progress ("CURES"), in all fields.

So the way to solve the problem of how to persuade researchers to provide OA is to persuade funders and institutions to mandate self-archiving. And the way to persuade them to mandate self-archiving is to persuade them that OA is to the advantage of research and researchers (and their institutions and funders and especially the tax-payers that fund the funders) because it both accelerates and increases research citations and downloads (i.e., research impact and progress: "CURES").

Downloads are not as yet being systematically measured and compiled (although they will be eventually), but citations are already being systematically measured and compiled -- and, moreover, they are correlated with downloads.

So the simple, straightforward argument for mandating self-archiving, the one that is immune to publishers' objections that OA is unneeded or that it might ruin their business, is that self-archiving is optimal for research progress itself ("CURES"), because it substantially increases research citations, which indicates that the research is being taken up, used, applied and built upon.

If we could add download counts to the argument, and downloads in particular by practitioners, patients and public (PPP), we would, but there are no such download counts yet, so we cannot add them directly and empirically to the usage/impact argument. It is not necessary, however, because free access for researchers also means free access for everyone else too, including PPP. So there is no need to adduce specific evidence that there is substantial PPP demand and need for access (especially because in most specialized fields there is unlikely to be!).

We cannot, however, say that the primary reason we need OA is because of PPP needs, because (1) we have no data on PPP use yet and (2) PPP use applies to only a small fraction of the research literature -- 2.5 million articles a year, across all fields, in 24,000 journals. Hence this is not a valid argument for OA self-archiving in general, and, if put up front as the main reason for seeking OA mandates, would lead to debate, delay and defeat after years of haggling, with publisher offers of "special deals," with the publishers making only a select subset of their articles OA -- those that might have some PPP interest -- rather than all articles, which would put all of the research journal needlessly at (hypothetical) risk, for no compelling reason.

That would be the PPP tail wagging the entire OA research dog: PPP needs are only a tiny (though important) subset of OA needs. And, more important, direct PPP access is definitely not the main way the public benefits from OA! Focussing primarily on PPP access is the wrong strategy for persuading researchers, their institutions and their funders of the need to mandate OA, even though PPP access does undeniably have superficial appeal with voters and politicians; in the end, on its own, or in the lead as the primary rationale for Open Access, PPP access would lead to debate, delay and defeat for a self-archiving mandate.

But using PPP access needs as the primary rationale for OA needs is not necessary. The solution is to put the irrefutable direct needs of researchers for research access (for the sake of the research and application benefits -- "CURES" -- it will provide for patients, practitioners, cures, the public) first, and note that OA will also provide PPP access as a side-benefit wherever wanted or needed.

It is ever so important not to weaken the case for OA -- the case that must be put to the researchers and their institutions and funders, across all fields -- by giving primacy to access by patients and practitioners. They will get access anyway. But they are not the research providers: Researchers are; and most of them don't do clinically relevant research; and even those who do are rewarded for their research impact, and not yet for their practical impact. (They will be rewarded for the latter after OA prevails, but not before, so that cannot be used to induce them or their institutions and funders to self-archive: research impact can, and it gives everyone else access too.)

I hope you understand these issues of logic and practicality better now: Only a small fraction of research is PPP-relevant, so the need for PPP access cannot be made the principle argument for OA or OA will lose.

Now some comments:

ANON: " I don't think that folks understand this distinction well. You and I do but researchers=lab to the more social sciences. We have a large health science program here and our faculty have "divisions" (i.e. research faculty versus clinical faculty). It is from these clinical faculty I have extended my appreciation of the problems in the field. When I read your 8-point agenda I believe that the clinical faculty would feel that they were not being embraced in it."

If the clinical faculty publish research (i.e., if they are OA providers), they are embraced by it. If they merely use research, they are irrelevant to a mandate that addresses research providers. However, since OA means OA for everyone, clinicians (indeed, all of PPP) are embraced by its outcome, which is Open Access to all the research they need.

Please distinguish what concerns research providers from what concerns research users. The OA problem is that of getting the providers to go ahead and provide the OA (and the solution is to mandate providing it). And the users are the beneficiaries (whether researchers, practitioners, patients, or the public). (Moreover, the public benefits incomparably more from the CURES than the READS). Please do not conflate the problem of getting access (the user problem) with the problem of getting providers to provide OA (the mandate problem). The solution to the mandate problem is also the provider solution to the user access problem.

ANON: "As we try to go about courting our disciplines I think that the language is important when we cross over to the professional/social sciences. There are few, if any, practitioners of particle physics. But there are lots of nurses, social workers, educators, and so on who could use the research but they are challenged to get it.... the situation is really grim... once students leave the school and move to "disconnected" areas of which there are many)."

You are mixing up the user problem and the provider problem here: The point is that providers have to be mandated to provide OA. You are also mixing up the (minority) practitioner-relevant OA fields with the vast majority of practitioner-irrelevant OA fields. OA and OA mandates need to cover them all, and the research impact argument is the decisive and universal one, not the practitioner argument, which is a minority special case, and could be strategically manipulated by publishers with special side-deals.

By the way, students could be added to PPP too, making it SPPP, and the same argument applies to them: OA gives them access along with the territory, and eventually their usage will be measured and credited too, through download counts. Moreover, to the extent that students are or become researchers, their usage also translates into citations and more research (and "CURES").

ANON: " I think that all that needs to be added is something along the lines of "research-practitioners benefit [from OA] too" and this is particular important to "isolated", "international" and "less-resourced" communities."

It's fine to add SPPP needs to research needs in the overall rationale for OA wherever possible (though I think it is already covered by "all would-be users"). Eventually, Connotea-style tagging will help quantify SPPP need and its benefits, the way it is already quantified by research citations...

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Self-Archiving Mandates at 03:22 | Comments (0) | Trackbacks (0)

Sunday, June 11. 2006

How to Counter All Opposition to the FRPAA Self-Archiving Mandate

SUMMARY: All objections to the FRPAA proposal to mandate OA self-archiving can be decisively answered: (1) Open access has been empirically demonstrated to benefit research, researchers and hence the public that funds the research, by substantially increasing research usage and impact. (2) There is no evidence to date that self-archiving has any negative effect on subscription revenue. (3) With an immediate-deposit/optional-access (ID/OA) mandate, deposit must be immediate (upon acceptance for publication), not delayed; only the access-setting (Open Access vs. Closed Access) can be delayed ("embargoed"). (4) In recognition of its benefits to research, 94% of journals already endorse immediate OA-setting; so the semi-automatic email-eprint request feature of the Institutional Repository software (allowing would-be users to email the author individually to request and receive the eprint by email) will only be needed for 6% of articles, to tide over any embargo interval. (5) OA is optimal for research and immediately reachable via self-archiving mandates right now; publishing models can and will adapt, if and when it should ever become necessary. (6) In response to attempts to delay and filibuster the adoption of the self-archiving mandate by calling for more "empirical studies to test for its likely impact": mandating self-archiving is itself the empirical test; the impact of the mandate can be reviewed annually to see what other effects it may be having -- apart from the positive effects evidence has already shown self-archiving to have. (7) The way to answer any suggestion that it is unfair to put publisher revenues at potential risk for the sake of general public access to a literature most of which none of the general public is ever likely to want to read is to note that OA is intended for the sake of the public benefits of the research that the public funds, which are maximized by making research maximally available to the users for whom it is mostly written, namely, researchers, so they can use and apply it in further research and applications, as intended, for the benefit of the public that funded it. (It will be publicly accessible to everyone else too, but only as a secondary benefit, not the primary rationale for OA, which is free access to publicly funded research, for researcher use, for public benefit.) (8) All evidence indicates that voluntarism, invitations, etc. simply do not work to generate self-archiving, whereas mandates do.

The AAP (and PSP and FASEB and STM and DC Principles Coalition) objections to the FRPAA proposal to mandate OA self-archiving (along with its counterpart proposals in Europe, the UK, Australia and elsewhere worldwide) are all completely predictable, have been aired many times before, and are empirically as well as logically so weak and flawed as to be decisively refutable.

But OA advocates cannot rest idle. Empirically and logically invalid arguments can nevertheless prevail if their proponents are (like the publishing lobby) well-funded and able to lobby widely and vigorously.

There are many more of us than there are in the publishing lobby, but the publishing lobby is fully united under its simple objective: to defeat self-archiving mandates, or, failing that, to make the embargo as long as possible.

OA advocates, in contrast, are not united, and our counter-arguments risk gallopping off in dozens of different directions, many of them just as invalid and untenable as the publishers' arguments. So if I were the publisher lobby, I would try to divide and conquer, citing flawed pro-mandate or pro-OA or anti-publishing arguments as a camouflage, to disguise the weakness of the publishing lobby's own flawed arguments.

We managed to unify behind our Euroscience recommendation. If we could unify in our response to the anti-mandate lobby, making a strong, coherent, common front, and if we then recruited our respective research communities behind that common front (again, being very careful not to let anyone get carried away into weak, foolish arguments!) I am absolutely certain we can prevail over the publisher lobby, definitively, and see the self-archiving mandates through to adoption at last.

Our simple but highly rigorous 8-point stance is the following (and we can be confident enough of its validity to lay it bare in advance for any who are inclined to try to invalidate it):

(1) Open access has already been repeatedly and decisively demonstrated -- with quantitative empirical evidence -- to benefit research, researchers and the public that funds research: It both accelerates and increases research uptake, usage, citations, and hence progress, substantially. in all disciplines so far tested (including physical sciences, biological sciences, social sciences) substantially.

This is the key rationale for mandating OA self-archiving, because it is simply not possible for publishers to argue that protecting their current subscription revenue streams from undemonstrated, hypothetical risk outweighs the substantial demonstrated, actual benefits to research. (They know that well. Hence they will not and cannot try to push that argument. They will try to skirt it, by instead trying to exploit potential weaknesses in our own stance. This is why it is important to make our stance rigorous and unassailable by resolutely excluding as gratuitous and unnecessary all weak or controvertible arguments or rationales.)

(2) There exists zero evidence that self-archiving reduces subscriptions; and for physics, the longest-standing and most advanced in systematic self-archiving, there are actually published testimonials from the principal publishers, APS and IOP, to the effect that self-archiving has not generated any detectable subscription decline in 15 years of self-archiving (even in the subfields where it has long been practised at or near 100%), and that APS and IOP are actively facilitating author self-archiving rather than opposing it.

So although even evidence of subscription decline would not be a valid reason for denying research the benefits of self-archiving, there is not even any evidence of subscription decline. Hence here too, the publishing lobby will only be able to speculate and hypothesize to the contrary, evoking ever shriller doomsay prophecies, but not to adduce any supporting empirical data, because all evidence to date goes in the direction opposite to their predictions of catastrophe.

(3) The publishing lobby's most vulnerable strategic point, however -- and this is ever so important -- is precisely the matter of the embargoes they are so anxious to have (if they cannot succeed in blocking the mandate altogether): But the immediate-deposit/optional-access mandate that we have specifically advocated immunises the mandate completely from embargo-haggling, because it is a deposit mandate, not an Open-Access-setting mandate: Deposit must be immediate (upon acceptance for publication), not delayed; only the access-setting (Open Access vs. Closed Access) can be delayed, with immediate OA-setting merely encouraged "where possible," but not mandated. This means that not even copyright arguments can be invoked against the mandate, and embargoes cannot delay deposit: they can only delay OA-setting.

The part we must keep clearly in mind, however, is that an immediate-deposit mandate is enough! There is no need to over-reach (by either holding out for an immediate-OA mandate or capitulating and allowing delayed deposit). An immediate (no-delay) deposit mandate will generate 100% OA as surely as night follows day. There is now and has all along been only one obstacle to 100% OA: getting the deposit keystrokes to be done. Once those are done, the benefits of OA itself will see to it that authors all soon choose to set access as OA.

And until then, the bibliographic metadata will be visible immediately webwide, and would-be users can use the semi-automatic email-eprint request feature of the Institutional Repository software to email the author individually to request and receive the eprint by email, just as they used to request reprints by mail in the paper era, but much more quickly. This will tide over research usage needs until Nature takes its course.

So what we must insist upon is an immediate -- no embargo, no exception -- deposit mandate (full text plus bibliographic metadata) together with encouragement to set access to the full text immediately as OA, but allowing the option of a Closed-Access delay period if necessary. On no account, however, should the delay be in the deposit itself -- just in the OA-setting.

(4) In addition, 94% of journals already endorse immediate OA-setting. So the email-eprint option will only be needed for 6% of articles, to tide over any embargo interval.

This need not be rubbed in the noses of publishers (it is for our own quiet satisfaction); but the fact that 94% of journals already endorse self-archiving can be used strategically to weaken publishers' arguments against mandating it. ["You (94%) give authors the green light to go ahead and self-archive, because you recognise that self-archiving is to the benefit of researchers and research, and then you try at the same time to prevent their institutions and funders from ensuring that researchers go ahead and reap those very benefits by mandating the self-archiving that generates them!" Making that contradiction explicit (affirming yet blocking the benefits of author self-archiving) will go a long way toward invalidating the weak and incoherent arguments publishers will be making against self-archiving and self-archiving mandates.]

I am absolutely certain that (1) - (4), clearly and resolutely put forward, and used to defeat every angle of the publishers' argument ("it will destroy peer review" "it will be expensive to the tax payer" "it will kill subscriptions" "it will destroy learned societies" "it's not needed: we have enough access already," "there will be multiple versions," etc. etc.), can be successful, even triumphant. However:

(5) We should definitely not allow ourselves to be drawn into publishers' counterfactual speculations about subscription revenue loss, for which there is zero evidence, by replying in kind, with counter-speculations of our own about the way publishing will change, evolve etc. Just stick to the facts: that OA is reachable via self-archiving right now and that OA is optimal for research. Everything else can and will adapt, if/when it should ever become necessary, but that is all merely hypothetical: The only sure thing now is that self-archiving is good for research, and hence it needs to be mandated, just as publishing itself is mandated.

(6) In response to attempts to delay and filibuster the adoption of the mandate by asking for more "empirical studies to test for its likely impact" the reply is crystal clear: Mandating self-archiving is itself the empirical study to test its impact; the policy can be reviewed annually to see what other effects it may be having -- apart from the beneficial effects we already know self-archiving has.

(7) One tricky point to watch out for is the "public access" argument: The rationale that OA is needed for the tax-payers who funded the research is a very shaky one. It may be a good vote-getter for a politician, but it definitely does not have the empirical, logical and practical force of the demonstrated research impact benefits of OA;p nor does it need to. The way to rebut the publishers' (valid) claim that it is unfair to ask them to put their revenues at risk merely or mainly for the sake of general public access to a literature that almost none of the general public is ever likely to want to read (except in a few practical areas of medicine, etc.) is to firmly redirect the "public right" and "public good" argument toward the public benefits of the research that the public funds, which are maximized by making research maximally available to the users for whom it is mostly written, namely, researchers, so they can use and apply it in further research and applications, as intended, for the benefit of the public that funded it. (It will be publicly accessible too, but only as a secondary benefit, not the primary rationale for OA, which is free access to publicly funded research, for researcher use, for public benefit.)

(8) And last, we of course have all the evidence (e.g. from the failed NIH public access "invitation" and the many near-empty institutional repositories worldwide) that voluntarism, invitations, etc. simply do not work to generate self-archiving, whereas mandates (CERN, Wellcome Trust, QUT, Southampton, Minho, NIT, Zurich) do -- thereby confirming the outcome of the JISC international, interdisciplinary surveys, which found that 95% of researchers report they will comply with a self-archiving mandate from their employers or funders. Otherwise, only 15% self-archive spontaneously.

All eight of these points are simple, transparent, sound and cannot be invalidated: There are no viable counter-arguments, counterexamples or counter-evidence to any of them. So if they are rigorously and systematically deployed, the publisher lobby will fail to block the self-archiving mandate. If, however, we needlessly venture instead into any shakier areas (publishing reform, copyright reform, public "right to know"), it is we who will fail!

I am certain, from long experience, that no argument at all against a self-archiving mandate can be rationally sustained in the face of (1)-(8), clearly and rigorously applied. We have no weapon against irrationality, of course, or against arbitrariness or brute force. But inasmuch as reason, evidence and public good are concerned, the case for a self-archiving mandate is extremely strong and I would say irresistible (if we ourselves can resist weakening it, gratuitously, by invoking other, fuzzy or defeasible arguments, or by failing to invoke the eight rigorous points we have, clearly and explicitly!).

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Self-Archiving Mandates at 16:12 | Comments (0) | Trackbacks (0)

Saturday, June 10. 2006

Critique of American Association of Publishers' Critique of FRPAA Self-Archiving Mandate

SUMMARY: Research quality is ensured through peer review, which is performed by the research community for free. Publishers manage the peer review process; in exchange they get to sell subscriptions to the paper edition as well as to the online edition of their journals. Supplementing paid subscription access for those would-be users who cannot afford the publisher's version by self-archiving the author's version brings substantial positive effects to research, researchers and the tax-payers that fund them in terms of increased research usage and impact. Self-archiving has had no detectable negative effects on subscription revenues to date, even for the two publishers (American Physical Society and Institute of Physics) in the fields that have been practising self-archiving the longest and most (15+ years). The way to test whether the FRPAA self-archiving mandate will ever have any effect on subscriptions is to adopt it and monitor its effects yearly, not to keep denying and delaying its already demonstrated benefits to research on the grounds of hypothetical risks to publisher revenues. Research is not publicly funded and conducted in order to immunise publisher revenue flows against all risk but in order to maximize research productivity and progress.

The latest AAP/PSP critique of the latest US Public Access Bill (FRPAA) makes the same points (already rebutted two years ago) that they made in their prior critique of the NIH Public Access Proposal.

Peter Suber has already sounded the right overall note by way of reply in OA News (see his 10 detailed points, much the same as mine, below):

(a) There is zero evidence that mandating self-archiving reduces subscription revenue.

(b) But even if self-archiving were ever to reduce subscription revenue, surely what is in the best interests of publishers' current revenue streams should not over-ride what is in the best interests of research and of the public that funds it.
AAP/PSP: "The proposed legislation would require the majority of recipients of U.S. federal research agency funds to make their findings free within six months of publication. Publishers argue that the legislation, if passed, will seriously jeopardize the integrity of the scientific publishing process, and is a duplicative effort that places an unwarranted burden on research investigators."

AAP provides no evidence of how making research findings accessible for free to would-be users who cannot afford access would "seriously jeopardize the integrity of the scientific publishing process." AAP merely stipulate that it would.

Nor is it clear why AAP is speaking on behalf of researchers about "unwarranted burdens". Surely enhanced research usage and impact is not an unwarranted burden for research and researchers?

AAP/PSP: "According to the publishers, the provisions of S.2695 threaten to undermine the essential value of peer review by removing the publishers' incentive and ability to sustain investments in a range of scientific, technical, and medical publishing activities."

Translation: "Self-archive and I may not want to publish journals any more."

Peer review is done by researchers, for free. Whoever funds the management of peer review and the certification of its outcome is a journal publisher. There is no evidence that self-archiving reduces subscription revenue but even if there should ever be such evidence it certainly does not follow that research and researchers should renounce the demonstrated benefits of self-archiving. If/when some publishers should ever become dissatisfied with reduced subscription revenues, their journal titles can migrate to other publishers who are not dissatisfied, or to Open Access ("gold") Publishers.

Surely demonstrated benefits (increased research impact) for research, researchers and the public that funds them are not to be sacrificed in order to insulate publishers from an undemonstrated hypothetical risk to their current subscription revenues.

AAP/PSP: "The proposed legislation comes at a time when increased public access to government-funded research is already occurring in a voluntary and highly effective manner through a variety of publisher-initiated mechanisms and cooperative approaches."

"Highly effective" for whom? The fact is that many researchers cannot afford access to much needed research, and the proof of this is the fact that when subscription access is supplemented by author self-archiving, research usage and impact increase dramatically.

(Note that the issue is not primarily public access to research, but researcher access to research, in order to maximize the benefits of research to the public that funds it.)

AAP/PSP: "Americans have easy access to scientific and medical literature through public libraries, state universities, existing private-sector online database, as well as through their professional, academic, or business affiliations, low-cost online individual article sales, and innovative health literacy initiatives such as patientINFORM."

The primary objective of Open Access is to provide access to researchers, worldwide, for the sake of research uptake, usage, applications, and progress, by way of a return on the public's investment in the research. Researchers do not now have nearly as much access as they need, because no research institution can afford all or most of the journals in which the research appears. The demonstrated impact advantage of self-archived research is the direct evidence of the substantial access shortfall there is for research that is not self-archived.

Paid or library access is certainly not what OA is about or for. OA means online access, free for all would-be users.

AAP/PSP: "The Cornyn-Lieberman bill would create unnecessary costs for taxpayers"

This is complete nonsense. Self-archiving costs are negligibly small.

AAP/PSP:
"[it would] place an unwarranted burden on research investigators"

Again complete nonsense. Self-archiving takes a few keystrokes:

Carr, L. and Harnad, S. (2005) Keystroke Economy: A Study of the Time and Effort Involved in Self-Archiving.
AAP/PSP: "[it would] expropriate the value-added investments made by scientific publishers-many of them not-for-profit associations who depend on publishing income to support pursuit of their scholarly missions, including education and outreach for the next generation of U.S. scientists"

Nothing whatsoever is "expropriated": Publishers can continue to sell subscriptions and licenses for their paper and online editions, exactly as before. The author's self-archived final draft is not a substitute but a supplement, online only, for all would-be users who cannot afford the publisher's version. And so far there is no evidence whatsoever that self-archiving reduces subscription revenues at all, even in the areas that have been doing self-archiving the longest (15 years in high energy physics, even longer in computer science) and that are already at or near 100% self-archiving for years now.

Swan, A. (2005) Open access self-archiving: An Introduction. Technical Report, JISC Survey:

"[W]e asked the American Physical Society (APS) and the Institute of Physics Publishing Ltd (IOPP) what their experiences have been over the 14 years that arXiv has been in existence. How many subscriptions have been lost as a result of arXiv? Both societies said they could not identify any losses of subscriptions for this reason and that they do not view arXiv as a threat to their business (rather the opposite -- in fact the APS helped establish an arXiv mirror site at the Brookhaven National Laboratory [and shortly the IOP will host one too]).

Not-for-profit publishers (e.g. Learned Societies) do not differ in any way insofar as any of these considerations are concerned: There is abundant evidence that self-archiving increases research usage and impact and no evidence that it reduces subscription revenue. And research is not funded, conducted and published in order to generate revenue for publishers, let alone in order to guarantee their current revenue streams and insulate them from any risk. In particular, what has already been demonstrated to be in the best interests of research outweighs what has not even been demonstrated to have any negative effects on the interests of publishers.

AAP/PSP: "If enacted, S.2695 could well have the unintended consequence of compromising or destroying the independent system of peer review that ensures the integrity of the very research the U.S. Government is trying to support and disseminate."

Pure nonsense. See prior reply about peer review, done for free by researchers (the peer reviewers); publishers merely administer it, and for any publishers who may no longer wish to administer it, other publishers will be happy to do so in their place.

AAP/PSP: "publishers invest hundreds of millions of dollars each year in publishing and disseminating peer-reviewed journals. These investments ensure the quality of U.S. taxpayer-supported scientific research by subjecting all articles to a rigorous technical review by experts in specialized fields prior to publication and pay for the development of technological innovations that enable broad web dissemination."

Quality is ensured through peer review done by the research community itself; and the peers review for free. Publishers merely administer the peer review, and in exchange they get to charge for the paper edition as well as the online edition. There is no evidence whatsoever that self-archiving diminishes their revenues from any of this, and if/when it should ever do so, the solution is certainly not to not self-archive, and thereby deny research of self-archiving's substantial benefits in terms of research uptake, usage, applications, impact and progress.

The solution -- if/when subscription cancellation pressure were ever to happen -- would be to cut costs and adapt, scaling down to the new, smaller but still essential niche of peer-review service provision that will remain for peer-reviewed research journals in the PostGutenberg age even if no one wants to pay for the paper edition or the publisher's official online version any more because the author's self-archived draft is enough. The solution is certainly not to deny research, researchers and the public that funds them the benefits of the research impact and progress that self-archiving brings them.

AAP/PSP: "Mandating that journal articles be made freely available on government websites so soon after their publication will be a powerful disincentive for publishers to continue these substantial investments."

At the moment, over 90% of journals have given immediate author self-archiving their green light. If some publishers are not happy with conferring this benefit on their authors' research, there are plenty of other publishers for their journal titles to migrate to (including the new breed of Open Access "gold" publishers).

AAP/PSP: "publishers are concerned that S.2695 would result in a significant loss of revenue from subscriptions, licensing, and individual article sales, thereby making it difficult for them to sustain and recoup the investments they make in support of scientific communication."

There is no evidence whatsoever that self-archiving has reduced subscription revenue in the very fields that have been doing it the longest and the most (see above). So this publisher concern is purely hypothetical; and the actual effects to date contradict the hypothesis.

But if/when there should ever be a subscription revenue decline, the remedy is to adapt, cut costs, drop inessentials, and downsize to the new PostGutenberg niche for peer-reviewed journal publishing. The remedy is certainly not to sacrifice research impact in order to sustain current publishing revenues instead of adapting to the new technological contingencies opened up by the newfound possibility of providing Open Access to all research.

AAP/PSP: The proposed bill was introduced on the first anniversary of the National Institutes of Health's (NIH) adoption of its Public Access policy, which encourages the posting of journal articles based on NIH-funded research within 12 months of publication on its existing PubMedCentral database -- a policy that gained PSP/AAP member publisher support and yet remains in its early stages of government-led implementation. A departure from the NIH's voluntary approach, the Cornyn/Lieberman bill would mandate that 11 federal agencies create new systems and data repositories to enforce internet posting of government funded research within six months of publication. As the NIH's implementation of the policy has not yet progressed to the point where its impact can be assessed, publishers view the introduction of the Cornyn-Lieberman proposal as premature."

(1) The NIH policy can be and has been assessed, and it is a failure: The level of compliance with its non-mandatory "invitation" to self-archive is less than 4% after a year. The spontaneous self-archiving baseline worldwide and across disciplines is 15%!

(2) Meantime, self-archiving mandates (such as those of the Wellcome Trust, CERN, and several universities) have been tried, tested, and shown to be successful in generating high compliance rates, exactly as the JISC author surveys had reported they would be:

Swan, A. and Brown, S. (2005) Open access self-archiving: An author study. JISC Technical Report, Key Perspectives Inc:

"The vast majority of authors (81%) would willingly comply with a mandate from their employer or research funder to deposit copies of their articles in an institutional or subject-based repository. A further 13% would comply reluctantly; 5% would not comply with such a mandate."

(3) Research is funded, conducted and published in order to be taken up, used, and applied as soon as it has been validated by peer review. Research is not funded, conducted and published to be embargoed so as to guarantee publishers' current revenue streams.

AAP/PSP: "No evidentiary record exists, and no impact studies have been conducted, to document the long-term cost to tax payers of government agencies developing yet another system to promote public access.

Surely it is not the business of American Association of Publishers to concern itself with the cost to tax payers of providing open access to government-funded research. But studies have indeed been done, across disciplines, and they have found that self-archived research has substantially higher research impact (25% - 250+%), and this translates into substantially higher return on the tax payers' investment in research than what they are getting for their research money today.

Competitively speaking, it also means higher salaries and more research income for the early self-archivers. And all, as noted, at a negligibly tiny cost per paper in terms of either author keystrokes or distributed institutional self-archiving costs.

So it is a self-serving red herring for publishers (in reality fretting about their own current revenue streams) to portray this as a "tax payer" issue.

AAP/PSP: "Moreover, no consideration has been given to what the impact of this government mandate will be on publishers and scholarly societies ability to maintain their broad base of library and other customers worldwide and invest in independent peer review systems."

The purpose of research and research funding is not to ensure publisher revenue streams, but to conduct, use and apply research, to the benefit of the tax payers that fund it. Peer-reviewers (researchers) review their peers' research for free. Journal editors merely manage the peer review process, and the true costs of managing peer review can and will certainly be paid out of just a small portion of institutions' own annual windfall subscription cancellation savings -- if and when subscription revenues were ever to collapse catastrophically as a consequence of universal self-archiving.

But at the moment there is not even the slightest sign of a subscription decline: just speculations about doomsday scenarios, intended to hold self-archiving, with all its demonstrated benefits to research, researchers and tax-payers, at bay, so as to protect publishers' current revenue streams from a hypothetical risk.

Surely the rational thing to do is to mandate the self-archiving now, and then review its effects on publishers' revenues yearly, rather than to deny its certain benefits to research on the grounds of its hypothetical risks for publishers. (The delay has already been unconscionably long and wasteful of research impact and progress, and will be all the more embarrassing in historic hindsight.)

AAP/PSP: "Responsible major U.S. government policy revisions must be based on a solid, researched understanding of the long-range impact of any policy changes. This perspective is conspicuously absent from the proposed legislation, which would cause severe harm to the publishing community, scientific societies, and taxpayers."

The long-range effects should be investigated empirically. The positive effects of OA self-archiving for research, researchers and the tax-payers that fund them have already been empirically tested and found to be substantial. Meanwhile, there have been no detectable effects of self-archiving on subscription revenues at all so far, even for the two publishers (American Physical Society and Institute of Physics) in the fields that have been doing it the longest and most (15+ years).

The way to test the long-range effect of the FRPAA self-archiving mandate on subscriptions objectively and empirically is to adopt the mandate and monitor its effects annually, not to deny or keep delaying its already demonstrated positive effects on research impact on the basis of undemonstrated hypothetical negative effects on publisher revenues.

AAP/PSP: "publishers and scholarly societies urge that an independent study be conducted to measure the potential impact that any changes to the existing NIH policy or the adoption of the proposed Cornyn-Lieberman legislation would have on scientific quality, the peer review process, and the viability of numerous journals and societies--as well as the additional costs that would need to be shouldered by taxpayers."

To do the study in question amounts to adopting the self-archiving mandate and testing and reviewing its empirical outcome annually. All else is merely filibuster and bluster.

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Self-Archiving Mandates at 19:48 | Comments (0) | Trackbacks (0)

Saturday, May 27. 2006

Plugging the Loopholes in the Proposed FRPAA, RCUK and EU Self-Archiving Mandates

SUMMARY: The loophole in the proposed FRPAA, RCUK and European Commission self-archiving mandates is that publisher embargo policies can open-endedly determine when the research is deposited. Two small but crucial changes can correct this: (1) Mandate immediate deposit for all articles immediately upon acceptance for publication (rather than mandating deposit only after an interval determined by the publishers, or after a fixed 6-month embargo). (2) Only recommend (rather than mandate) immediately setting the access to each deposit as Open Access (OA): allow the option of setting access instead as Closed Access (CA) where deemed necessary. This separate "Dual Deposit/Release" (DD/R) policy requires that the full text and metadata be deposited immediately, not after a delay; and it requires that the metadata (only) be immediately set to Open Access, so they are visible and accessible worldwide immediately upon deposit; but it allows the option of not setting access to the full-text itself as OA immediately. It thereby transfers the force of any embargo/delay onto OA access-setting instead of leaving it on the deposit itself, which must be immediate. Nothing at all is lost, relative to the delayed-deposit mandates currently being proposed, but what is gained is systematic immediate deposit of the full texts -- plus the capability of providing individual access via user email-eprint requests to the author, semi-automatized by the Institutional Repository software.
ANON: " My primary aim is to develop a model OA legislation... I am consulting the FRPAA 2006 bill and other texts. In formal legislation, we generally avoid any loopholes or pitfalls that may have a counter-effect on the very objectives of the legislation."

Pre-emptively avoiding loopholes and pitfalls is indeed the optimal strategy. And it is precisely for that reason that instead of recommending that you emulate exactly the proposals of the FRPAA, the RCUK, or the European Commission, which all still have some needless weaknesses as well as loopholes, I recommend that you change two of their parameters to make your own policy recommendation the optimal one:

The two changes are small but absolutely crucial:

(1) Mandate immediate deposit for all accepted papers (rather than mandating deposit only after an interval determined by the publishers, or after a fixed 6-month embargo).

(2) Only recommend (rather than mandate) immediately setting the access to each deposit as Open Access (OA): allow the option of setting access instead as Closed Access (CA) where deemed necessary (i.e., where it is thought that immediately setting access as OA would contravene the author's copyright agreement with the publisher).

This separate Dual Deposit/Release (DD/R) policy requires that the full text and metadata be deposited immediately, not after a delay; and it requires that the metadata (only) be immediately set to Open Access, so they are visible and accessible worldwide immediately upon deposit; but it allows the option of not setting access to the full-text itself as OA immediately. In effect, it thereby transfers the force of any embargo/delay onto OA access-setting instead of onto the deposit itself, which (to repeat) must be immediate.

The US is probably the only country in the world that has enough collective weight to go even further, because it represents such a large proportion of the authorship of so many journals, and of the funding of so much published research: The US can, I think, with impunity put a cap (of six months, or even less) on the maximal allowable embargo period. The Wellcome Trust has already done this, telling their authors: "If your publisher does not agree to a cap of 6 months on the embargo, choose another publisher!" Wellcome, however, has the advantage, for this admirable and bold move, that they are a private funder. So all an author can do, if he does not like the Wellcome's terms, is to choose another funder.

But the US Federal funding agencies are governmental, so they can be lobbied, not only by publishers but also by researchers and their institutions, if they object to the terms: And researchers may well object to being told that they cannot publish in the journal of their choice if it does not agree to a 6-month embargo cap!

But the Dual Deposit/Release (DD/R) mandate removes even this possible obstacle to the adoption of the policy; for it does not require either switching publishers or contravening the terms of their copyright agreement. It allows embargoed access-setting but mandates immediate deposit (and then semi-automatic email-eprint requests in the repository software will take care of any gap-period in which the metadata are visible and accessible but the full-text is Closed Access).

I think this slight parametric change answers your remaining concerns.

ANON: "In the case of FRPAA 2006, one has to be dependent on the mercy of the journal publishers, presuming that 93% of them will immediately allow the researcher to deposit his research in OA. Please let us know how you have reached that figure."

That's 93% of journals, not publishers (as some publishers publish more journals than others). I have reached the figure on the basis of each journal's own official author self-archiving policy, as indexed, by journal as well as by policy, in the Romeo directory, for over 9000 journals, including all the principal ones.

But note that the Dual Deposit/Release (DD/R) policy would be immune to publisher objections and policies even if 100% of journals had had an embargo, because it does not mandate Open Access! It simply mandates immediate repository deposit, which is an internal record-keeping matter, and that is in no way the business of publishers or anyone else.

What gives the Dual Deposit/Release (DD/R) policy its power is that (a) it also mandates the immediate deposit of the bibliographic metadata (author, title, journal name, date, etc.), which no one can embargo, and (b) the webwide, uniform visibility and accessibility of the metadata immediately makes the existence of the article visible as well, and allows each individual would-be user who finds that the article is Closed Access to use the semi-automatic eprint-request button we have designed for the GNU Eprints Institutional Repositories (since implemented for the Dspace repository software as well, and easily implementable by all other repository softwares) to request that the author email him the full-text.

So whereas the dual mandatory-immediate-deposit plus optional-delayed-release policy does not mandate OA per se, it instead -- in exchange for immunity from all possible objections on the grounds of either copyright or author choice of journal -- requires immediate deposit in all cases, no exceptions or delays, and relies on (trivial) technology to fill any access-needs during any embargo-gap. (It is unlikely, by the way, that once the Dual Deposit/Release mandate is widely adopted, embargoes will survive for very long -- and in the meanwhile, all usage needs will be filled by the software technology, using the unique capabilities of the new medium).

If the adoption of the proposed mandates instead keeps being delayed and delayed, as it has been now for the past 3 years, for the very reasons you are here raising, everywhere (except the Wellcome Trust plus the six institutions that have already mandated self-archiving) -- and even if the mandates are adopted as delayed deposit mandates (as the Wellcome policy is, with a cap of 6 months on the permissible delay) -- the result will be that embargoes will only become more deeply entrenched instead of being defused, as they are by the mandatory-immediate-deposit plus optional-delayed-release policy, which fixes both the date of deposit and the practise of depositing for all authors and papers uniformly and optimally, allowing no exceptions.

It may be useful for you to bear in mind -- and also to inform your policy advisees -- that the real thing that has been holding back 100% OA for a dozen years now has not been copyright policy but keystrokes: Authors have not been depositing their articles in their institutional (or central) repositories. Two international, cross-disciplinary surveys for JISC by Key Perspectives have shown that most authors don't and won't self-archive spontaneously, but 95% of them will do see if/when mandated to do so by their funders and/or their institutions. And the actual experience of the (few) institutions that have already gone ahead and mandated self-archiving confirms the predictions of these JISC surveys.

The self-archiving mandate should be thought of as a mandate to perform those all-important keystrokes -- for record-keeping as well as metadata-exposing purposes -- immediately upon acceptance for publication, which is the natural date, the date when usage of the research can and should begin, and the date that provides a fixed, objective landmark for all papers.

(Swan et al. have written an excellent strategic analysis of the question of institutional versus central repositories: There is also excellent further policy guidance here [1] and [2] )

Once a "Keystroke Mandate" becomes widely implemented, 100% OA will not be far behind; but keep delaying the keystrokes because of other worries, or over-reaching, and you only keep delaying 100% OA.

The timing of the deposit is now independent of any publisher embargo periods, which instead apply only to the timing of the Open Access-setting for the (Closed Access) full text. The policy need make no mention of embargo periods, except to recommend setting access OA immediately or as soon as possible.

ANON: "Nevertheless, a public law can not be dependent on the assumptions, if this 93% of publishers tomorrow decide not to allow researchers to deposit in OA, the law is then helpless."

Not the law I have just described! It is completely unaffected. Please also see also the self-archiving FAQ on this very question.

ANON: "In summary, there is a difficulty in constructing a law that guarantees 100% of OA even if the researcher/research-institute is willing to do so."

What is needed right now is not a law that guarantees 100% OA; what is needed is a mandate that guarantees 100% immediate deposit of the full-texts and 100% immediate visibility of the metadata. Nature will take care of the rest.

But delaying and delaying the adoption of the policy (or weakening it to ineffectuality by allowing fixed or open-ended deposit-delays in order to comply with every possible embargo) is a way of guaranteeing that 100% OA will remain a long, long way off!

The obstacle is keystrokes, and what is needed is a keystroke mandate, i.e., an immediate-deposit mandate, for internal record-keeping and external visibility purposes only. That is not an "OA law" but a law specifying conditions on the fulfilment conditions for receiving public funding for research: The resulting publications must be immediately deposited in an OAI Repository and must immediately make their metadata visible.

Once that law is in place, 100% OA will quickly ensue of its own accord. But keep delaying the law by agonising instead over what will allow immediate OA for the full text in all cases -- or what interval can be agreed upon for delayed deposit -- and 100% OA will be delayed for yet another needless decade,

ANON: "Moreover, the semi-automatic email is not open access as the term is being used in the legal texts or definition of OA. It depends on the will of the author/publishers et al and access is platform/ technology depended. A legislation has to be platform/ technology neutral/ independent."

You are quite right that immediate deposit, immediate access to the metadata, plus the email-eprint feature is not OA! But (1) it is almost OA, (2) it will rapidly usher in OA, and (3) it is infinitely more useful to research and researchers, now -- and for the very reasons that OA is so important and needed -- than continuing to delay and agonise over legislation that will somehow manage to formally accommodate publisher copyright agreements, and lobbies, and author choice, all at once, and in advance.

However, it is not true at all that the Dual Deposit/Release "keystroke" mandate depends on the will of the publishers, nor that access is platform/technology dependent. All of OA, and indeed the definition of OA, is dependent on some technology (the Internet and the Web) and in particular, on the OAI protocol for metadata-harvesting and interoperability. Hence all Open Access Repositories are OAI-compliant. That's easy, free, and all that's needed.

And the mandate is platform-neutral too: It just requires immediate deposit of the full-text and immediate webwide visibility of the metadata. The rest will take care of itself.

As to dependence on the author's will: All authors of all 2.5 million annual articles in the world's 24,000 peer-reviewed journals want to have as many users and citers of their research as they possibly can. That is why, in paper days, they would take the time, trouble and expense to mail paper reprints to individual reprint requesters. With the DD/R mandate, this is all reduced to one author keystroke, upon receiving the semi-automated eprint-request generated by the repository software. The only thing now standing in the way of that option today is the other N-1 keystrokes needed to deposit the full-text and the metadata into the repository. And that is what the DD/R "keystroke" mandate is for!

Researchers never sought royalty revenue for their articles, and they never sought to deny access. Mandate the requisite N-1 keystrokes and they will do the Nth one happily for each individual eprint request (and to re-set access from Closed Access to Open Access when the embargo expires or they tire of doing the individual keystrokes -- whichever comes first!).

ANON: "In some cases, especially in social sciences, researchers build their papers on their research experiences from various sources, not necessarily with that of current research funding. So, can government law bind them in a strict legal sense to make their publications OA?"

No. A law based on conditions of funding (whether funding of research grants or funding of research or academic institutions) can have no power over what it does not fund. But it need not. The effects of a self-archiving mandate on funded research will propagate to non-funded research rapidly as the beneficial effects of Open Access on research accessibility, impact and progress are increasingly felt across the entire world research community.

Right now, to repeat: the only thing holding us back from the feeling of those beneficial effects, and from their propagation across disciplines and around the world, is keystrokes -- not copyright: keystrokes. Hence an immediate-keystroke mandate is all that is needed.

The (proposed) law does not tell the fundee where to publish. (It is already required that funded findings be published ["publish or perish"]). The discipline and research community dictates where to publish (in the highest quality peer-reviewed journal whose content and quality standards it successfully meets). The DD/R mandate would add only that the fundee must deposit the publication (if it is an article rather than a book) in an OAI-compliant repository (preferably his own institutional IR) immediately upon acceptance for publication.

ANON: "If the journal publisher allows... the researcher to deposit the paper in [immediate] OA it is fine; otherwise the researcher is in danger of violating the copyright law of illegally distributing, reproducing, copying, extracting copyright material over the internet."

You are not talking above merely about depositing but about OA-depositing (i.e., depositing and setting access to the full-text immediately to OA rather than CA). So if thus making the full-text freely available online would violate the copyright agreement with the publisher, the only thing the FRPAA can do, if it wishes to mandate immediate OA-deposit, is to require the fundee not to sign such an agreement (as the Wellcome Trust requires, for agreements with embargoes exceeding 6 months).

But such a requirement would in turn be open to the author/institutions objection that it constrains the author's free choice as to where to publish.

So the solution is to require deposit only (not OA-deposit), and merely to encourage OA-access setting whenever and as soon as possible.

There is no other current solution: only the endless pre-emptive debating about what the publisher would/should/could require and what the author would/should/could do -- in other words, the effective pre-emptive "embargo" on the very adoption of a self-archiving mandate that we have been stalled in now for three years! Continuing that debate simply invites more years of lobbying, delay, and accumulating access/impact loss.

What is needed is an immediate keystroke mandate, immune to debate or delay. (Then continue the debate while the keystroke mandate is having its natural effect.)

This ground has already been covered, over and over, many times before, in the past three years. There is no resolution, only inaction. It is time for action, and the Keystroke Mandate is the requisite action.

ANON: "So the solution to this problem cannot be within the technological systems of OA, [e.g., the email-eprint button of the IR software], but the OA Act would have to make exceptions for such cases. The FRPAA does not mention such exceptions - hence it needs to be rectified from this pitfall. A legal expert would be able to throw some light on it, I believe."

No, a (lawyer-dictated) inventory of provisos and exceptions will only make the mandate, confused, confusing and ineffectual: Immediate deposit of full text and metadata. No exceptions. No delays. And as access-setting is not at issue, publishers, hence author journal-choice, are both out of the loop.

Words stand in our way: The purpose of a Keystroke Mandate is of course Open Access. But call it an "Open Access Mandate" instead and you are up against the inventory of provisos and exceptions (that will in any case not add up to 100% OA). It is the tail wagging the dog. Mandate the immediate keystrokes, now, and then go back to debating the provisos and exceptions.

Nor has technology anything to do with it, other than the online medium itself, that new medium on which the very concept and possibility of OA are predicated (Open Online Access). Read the opening words of the BOAI statement: "An old tradition and a new technology have converged to make possible an unprecedented public good:

OAI-compliance is highly desirable, and easily feasible too, but need not be mentioned in any law, if you think it will produce a paralysing technology-dependence debate. Same for the trivial eprint-request button, implemented in both of the major free softwares for OAI IR creation (Eprints and Dspace)...

Just mandate the immediate keystrokes and nature will take care of all the rest...

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Self-Archiving Mandates at 23:15 | Comments (0) | Trackbacks (0)

Friday, May 26. 2006

The Epidemiology of OA

Update Jan 1, 2010: See Gargouri, Y; C Hajjem, V Larivière, Y Gingras, L Carr,T Brody & S Harnad (2010) “Open Access, Whether Self-Selected or Mandated, Increases Citation Impact, Especially for Higher Quality Research”
Update Feb 8, 2010: See also "Open Access: Self-Selected, Mandated & Random; Answers & Questions"

As Eysenbach’s very long and remarkably intemperate response to my prior response mainly repeats prior (answered) points, I will respond only to his very few substantive points:

I had asked: “Does [Eysenbach] seriously think that partialling out the variance in the number of authors would make a dent in that huge, consistent effect [the within-journal citation advantage for self-archived articles]?”

GE: “the answer is ‘absolutely’… If high-author papers are overrepresented in self-archived papers, then this confounder alone will contribute to having a greater number of citations… Only if one statistically controls for all these confounders (there are several of them - see PLoS paper), and one STILL sees an open access citation advantage, then (and only then) one has a SOLID, defendable study. ”

Here is Eysenbach’s list of confounders :

(1) number of authors: As Eysenbach says he is serious, we will now test this. We have the data. Eysenbach’s prediction is that partialling out the effect of the number of authors will make a dent in our huge, consistent citation advantage. Stay tuned...

(2) number of days since publication: This is relevant and feasible in a 1-year, 1-journal study like Eysenbach’s but neither relevant nor feasible for a sample of over a million articles ranging over 12 years, 12 disciplines, and hundreds of journals -- all showing exactly the same citation advantage for self-archived articles in every year and every discipline.

(3) article type: We are able to test this separately too (because we have ISI data on article type) but first let’s see whether partialling out author numbers makes a dent in our basic effect.

(4) country of the corresponding author: This is testable too, but first let’s see how the author-number ‘confounder’ pans out (we could look at the first-author's birth-sign too...).

(5) funding type: Data not available, and extremely far-fetched.

(6) subject area: Already tested and reported in our data, separately for 12 different disciplines : the self-archiving advantage is consistently present in all of them.

(7) submission track (PNAS has three different ways that authors can submit a paper): Not relevant to the journals we tested, which were all non-OA and pre-dated Open Choice.

(8) previous citation record of the first and last authors: This, as I noted, is -- along with the demonstration of how early the OA advantage emerges in PNAS – a potentially interesting variable in the fine-tuning of the OA advantage, but our own studies are concerned with estimating the generality and size of the OA advantage, not with its fine tuning.

(9) whether authors choosing the OA option in PNAS chose to do so for only their most important research (“they didn't”): Neither Eysenbach’s study nor ours can confirm causality or eliminate the possibility of self-selection bias.

GE: "the fact that we look at a immediate (gold-)OA article population in a longitudinal cohort study design takes care of the “arrow of causation” problem, because it makes sure that open access status comes first, then the citations are coming, not the other way round.

I'm afraid it's not quite that easy to take care of the "arrow of causation" problem, which is confounded (sic) with the problem of self-selection bias: For if authors are (contrary to their subjective reports) indeed self-selecting their better papers (or themselves!) for OA-gold (or for self-archiving) then that, and not the OA, could explain why their papers get more citations.

GE: “it is entirely possible that the articles in his sample (which he refers to as green-OA articles) were not “immediately” self-archived after publication, but 1 month, 6 months, or 12 months after original publication, therefore not really what Harnad refers to as green-OA, implying “immediate” deposition.”

This is actually a valid point of definition: OA should be defined as ‘immediate’ in order to rule out claims that delayed/embargoed access is Open Access. The point at which refereed research can and should begin to be used is when the final refereed draft is accepted for publication, and that is the point when it should be made freely accessible online. So a portion of the citation advantage for self-archived articles could well have come from self-archiving later than the publication date; technically speaking this should be called a ‘free access’ advantage, if we reserve the term "OA" for access that is free immediately. But surely nothing of substance rides on this: If there is a self-archiving advantage even for tardy self-archiving, that confirms, a fortiori, the self-archiving advantage of prompt (OA) self archiving too!

GE: “I… made a conscientious decision to submit my paper to a gold-OA journal (PLoS) rather than publishing the study in an obscure scientrometrics journal and then self-archived [sic] it”

Actually, unless I am mistaken, I seem to recall corresondence from GE to the effect that it was first declined by Science (or was it Nature?) – not a gold-OA journal – before being submitted to PloS Biology)…

GE: “The visibility of an article published in a properly promoted OA journal site will always be better than a paper that is published in a toll-access journal site, even if it is self-archived. This is exactly why my study shows an advantage of gold-OA over green-OA, this is also why I personally chose the gold route to publish this paper in PLoS, and not the green route”

Let us not confound a journal's profile/impact level with its OA/non-OA status.

The visibility (and no doubt also the citation impact) of an article will always be better when it is published in a high-profile, high-impact journal, whether it is OA (like PLoS) or non-OA (like Science or Nature) rather than an obscure scientometrics journal (or an obscure OA journal). Its visibility and impact will be higher if self-archived in either case (except perhaps if the journal is both high-profile and optional-OA, which is partly what Eysenbach’s study has shown).

GE: “the PLoS paper is the first study which contains an analysis of both gold and green (thus focuses on “OA itself”), whereas the rest of the studies is actually focused on ‘green’”.

Because most of the existing data for within-journal OA/non-OA comparisons comes from the millions of articles published in the thousands of non-gold journals indexed by ISI and not just the thousands of articles published in the few journals that are as yet (like PNAS) optional-gold...

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Publishing Reform at 01:31 | Comments (0) | Trackbacks (0)

Wednesday, May 24. 2006

End of PLoS Exchange

Update Jan 1, 2010: See Gargouri, Y; C Hajjem, V Larivière, Y Gingras, L Carr,T Brody & S Harnad (2010) “Open Access, Whether Self-Selected or Mandated, Increases Citation Impact, Especially for Higher Quality Research”
Update Feb 8, 2010: See also "Open Access: Self-Selected, Mandated & Random; Answers & Questions"

PLoS seems to have concluded that it is not in their interest to host further public contributions to the Eysenbach debate from me -- and perhaps they are right (that it is not in their interest)...

From: plos AT plos.org
Date: May 19, 2006 3:13:07 PM EDT (CA)
To: Stevan Harnad harnad AT ecs.soton.ac.uk
Subject: e-Letter: Decision

Unfortunately we decided not to accept your e-Letter. Letters are published at the editors' discretion, and we publish only those that we believe will contribute substantially to the debate. Our editorial decisions about publishing letters are final, and are not open to appeal.

Below is appended the mercilessly compressed fragment that I had submitted to PLoS as a follow-up letter (responding to Eysenbach's PLoS letter responding to my PLoS letter responding to his PLoS article).

The full version of my reply of course appeared on AmSci and in my Archivangelism blog -- but, as Eysenbach's study showed, one gets still further visibility from appearing on the website of a high-profile, high-impact journal! My (valid) rebuttal to Eysenbach's suggestion that self-archiving is to OA publishing as handing out leaflets is to publishing in a newspaper was that we are talking about publications in the case of OA, not unpublished materials! But with a letter, it's more like handing out leaflets when it just appears in a Forum or a Blog, versus the website of a high-impact journal...

Never mind. The content is what matters, and time is on OA's side (even if it is much too dilatory!), because OA is (you've heard the song!): Optimal and Inevitable.

Confirming the Within-Journal OA Impact Advantage

(Click for Fuller version)

Given the large within-journal OA citation impact advantages repeatedly found across all journals, disciplines and years in samples four orders of magnitude larger than Eysenbach's, it is not clear that controls for "multiple confounders" are needed to demonstrate the reality, magnitude and universality of the OA advantage. (This does not mean Eysenbach’s controls are not useful, just that they are not yet telling us much that we don't already know.)

Eysenbach (and PLoS) are focussed on gold-OA journals; most other OA impact studies are focused on OA itself. Only ~10% of journals are gold today. Few as yet offer authors "Open Choice" (allowing gold within-journal OA/NOA comparisons) and few authors are as yet choosing paid OA.

Regarding the “arrow of causation: yes, “longtitudinal cohort” data would demonstrate causation (for skeptics who think the OA advantage might be a self-selection bias) but Eysenbach's author self-reports certainly aren’t such data! Meanwhile: (a) the OA advantage does not diminish for younger articles; (b) OA increases downloads; (c) increased downloads in the first 6 months correlate with increased citations later; (d) unaffordability reduces access; (e) access is a necessary condition for citation.

About OA being a “continuum” or “spectrum”: Time is certainly a continuum, and access certainly admits of degrees (access may be easier/harder, narrower/wider, cheaper/dearer, longer/shorter, earlier/later, partial/full) -- but Open Access does not admit of degrees (any more than pregnancy does). OA is defined as: full-text online access, free for all.

Eysenbach likens self-archiving to “printing something on a flyer and handing it out to pedestrians on the street [instead of] publishing an article in a national newspaper." But it is published articles that are being self-archived.

NOA (Not OA): 1159 articles (86.2% cited at least once)
POA (Payed OA only): 176 (94.3%)
SOA (Self-Archived OA only): 121 (90.1%)
BOA (POA and SOA): 36 (97.2%)

In this PNAS sample, POA, SOA and BOA together, and POA alone, all have significantly more citations than NOA, but SOA alone ("stratified") does not; also, both POA and SOA increase citations, but POA does it more.

Three possible hypotheses explaining the BOA>POA>>SOA>NOA outcome:

H1: The POA advantage might be a multiple-archiving effect, maximal for high-profile , 3-option (POA, SOA, NOA) journals like PNAS because POA articles are more visible than SOA. (POA + SOA = BOA highest of all: redundancy helps!) As Institutional Repositories fill, this extra advantage will disappear.

H2: The POA advantage might arise in part from self-selection because the decision to pay for POA is influenced by the author's sense of the potential importance (hence impact) of his article. (But I think self-selection quality-bias is just one of many contributors to the OA advantage itself, not the only one or the biggest.)

H3: The POA advantage might be either a small-sample chance result or a temporary side-effect of the 3-option journals in early days: a one-stop shopping advantage for PNAS articles, in a high-profile store, today. It needs to be tested for replicability and representativeness in larger samples of articles, journals, and time-bases.

The true measure of the SOA advantage today is not found in PNAS but in the far more populous and representative full spectrum of journals not yet offering POA. (I’d be delighted if those journals took Eysenbach’s findings as a reason for offering a POA option! But not at the expense of authors wrongly inferring that for the journals they currently publish in, SOA alone would not confer citation advantages at least as big as the ones we have been reporting.)

Harnad & Brody (2004)
Brody et al (2005)
Hajjem et al (2005)

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Publishing Reform at 13:52 | Comment (1) | Trackbacks (0)

Thursday, May 18. 2006

Confirming the Within-Journal OA Impact Advantage

Update Jan 1, 2010: See Gargouri, Y; C Hajjem, V Larivière, Y Gingras, L Carr,T Brody & S Harnad (2010) “Open Access, Whether Self-Selected or Mandated, Increases Citation Impact, Especially for Higher Quality Research”
Update Feb 8, 2010: See also "Open Access: Self-Selected, Mandated & Random; Answers & Questions"

Gunther Eysenbach (GE) (in a letter in letter PLoS Biology today) wrote:

GE: "The introduction of the article and two accompanying editorials [1, 2, 3] already answer Harnad's questions why author, editors, and reviewers were critical of the methodology employed in previous studies, which all only looked at "green OA" (self-archived/online-accessible papers)"

I didn't ask why the author and editors were critical of prior self-archiving (green OA) studies; I asked why they said such studies were "surprisingly hard to find" and why the two biggest and latest of them were not even taken into account:

Brody, T., Harnad, S. and Carr, L. (2005) Earlier Web Usage Statistics as Predictors of Later Citation Impact. Journal of the American Association for Information Science and Technology (JASIST) 56.

Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4) pp. 39-47.

And the reason all prior within-journal studies look only at "green OA" is that the majority of OA today is green; hence almost all OA/NOA impact comparisons are based on green OA (self-archiving) rather than on paid-OA (gold). To compare OA and NOA between rather than within journals would be to compare apples and oranges: See critique of ISI's between-journal OA/NOA comparisons in:

Brody, T. and Harnad, S. (2004) Comparing the Impact of Open Access (OA) vs. Non-OA Articles in the Same Journals. D-Lib Magazine 10(6).

GE: " (hint 1: "confounding") (hint 2: arrow of causation: are papers online because they are highly cited, or the other way round?)."

I am afraid I don't see Eysenbach's point here at all: What exactly does he think is being confounded in within-journal comparisons of self-archived versus non-self-archived articles? The paid-OA effect? But among OA articles today there is almost zero within-journal paid-OA, because so few journals offer it! (Hajjem et al.'s within-journal comparisons were based on over a million articles, across 12 years and hundreds of journals, in 12 disciplines! Eysenbach's were based on 1492 articles, in 6 months, in one journal.)

And is Eysenbach suggesting that his failure to find any significant difference among author self-reports -- about their own article's quality and its causal role in their decision about whether or not to pay for OA (or to self-archive) in his sample of 237 authors -- is an objective test of the arrow of causation? (I agree that Eysenbach's failure to find a difference fails to support the hypothesis of a self-selection bias, but surely that won't convince those who are minded to hold that hypothesis! I would welcome rigorous causal evidence against the self-selection hypothesis as much as Eysenbach would, but author self-reports are alas not that evidence!)

GE: " The statement in the PLoS editorial has to be seen against this background. None of the previous papers in the bibliography mentioned by Harnad employed a similar methodology, working with data from a "gold-OA" journal."

Yes, almost all prior studies on the OA impact advantage are based on green OA, not gold, but so what? It is Eysenbach (and PLoS) who are focussed on gold-OA journals; the rest of the studies are focussed on OA itself. Only about 10% of the planet's peer-reviewed journals are gold today, and most of those are 100% gold, hence allow no within-journal comparisons. Very few journals as yet offer authors the "Open Choice" (optional paid gold) that would allow gold within-journal OA/NOA comparisons; and few authors are as yet taking those journals up on it (about 15% in this PNAS sample), compared to the far larger number that are self-archiving (also 15%, as it happens, though that percentage too is still far too small!). The difference in article sample sizes is about four orders of magnitude (c. 1500 articles in Eysenbach's study to 1.5 million in Hajjem et al's).

GE: " The correct method to control for problem 1 (multiple confounders) is multivariate regression analysis, not used in previous studies."

Correct. But with the large, consistent within-journal OA/NOA differences found across al journals, all disciplines and all years in samples four orders of magnitude larger than Eysenbach's, it is not at all clear that controls for those "multiple confounders" are necessary in order to demonstrate the reality, magnitude and universality of the OA advantage. That does not mean the controls are not useful, just that they are not yet telling us much that we don't already know.

GE: " Harnad's statement that "many [of the confounding variables] are peculiar to this particular... study" suggests that he might still not fully appreciate the issue of confounding. Does he suggest that in his samples there are no differences in these variables (for example, number of authors) between the groups? Did he even test for these? If he did, why was this not described in these previous studies?"

No, we did not test for "confounding effects" of number of authors: What confounding effects does Eysenbach expect from controlling for number of authors in a sample of over a million articles across a dozen disciplines and a dozen years all showing the very same, sizeable OA advantage? Does he seriously think that partialling out the variance in the number of authors would make a dent in that huge, consistent effect?

Not that Eysenbach's tentative findings on 1st-author/last-author differences in his one-journal sample of 1492 are not interesting; but those are merely minor differences in shading, compared to the whopping main effect, which is: substantially more citations (and downloads) for self-archived OA articles.

GE: " The correct method to address problem 2 (the "arrow of causation" problem) is to do a longitudinal (cohort) study, as opposed to a cross-sectional study. This ascertains that OA comes first and THEN the paper is cited highly, while previous cross-sectional studies in the area of "green OA" publishing (self-archiving) leave open what comes first -- impact or being online."

I agree completely that time-based studies are necessary to demonstrate causation, for those who think that the OA advantage might be based on self-selection bias (i.e., that high-impact studies tend to be preferentially self-archived, perhaps even after they have gained their high impact), but Eysenbach's author self-report data certainly don't constitute such a longtitudinal cohort study! (Once there exist reliable deposit dates for self-archived articles, we will be able to do some time-based analyses on green OA too, but, frankly, by that time the outcome is likely to be a foregone conclusion.)

In the meanwhile, the fact that (a) the OA advantage does not diminish for younger articles (as one would expect if it were a post-hoc effect), that (b) OA increases downloads, and that (c) increased downloads in the first 6 months are correlated with increased citations later on -- plus the logic of the fact that (d) unaffordability reduces access and that (e) access is a necessary condition for citation -- all suggest that most of the scepticism about the SOA advantage is because of conflicting interests, not because of objective uncertainty.

GE: " Harnad - who usually carefully distinguishes between "green" and "gold" OA publishing -- ignores that open access is a continuum, much as publishing is a continuum"

I'm afraid I have no idea what Eysenbach means about OA being a continuum: Time is certainly a continuum, and access certainly admits of degrees (access may be easier/harder, narrower/wider, cheaper/dearer, longer/shorter, earlier/later, partial/full) -- but Open Access does not admit of degrees (any more than pregnancy does). OA means immediate, permanent, full-text online access, free for all, now.

And, by the way, green OA is certainly not a lesser degree of gold OA!

For the innocent reader, puzzled as to why this would even be an issue:

Please recall that OA (gold) journals, whether total or optional gold, need authors (and those gold journals with the gold cost-recovery model need paying author/institutions). To attract authors, gold journals need to persuade them of the benefits of OA. So far so good. But there is another thing they have to persuade them of, implicitly or explicitly, and that is the benefits of gold OA over green OA. For if there are no benefits of gold over green, then surely it makes much more sense for authors to publish in their journal of choice, as they always did, and simply self-archive their own articles, rather than switching journals and/or paying for gold OA!

This theme alas keeps recurring, implicitly or explicitly, in the internecine green/gold squabbles, because green OA is indeed a rival to gold OA in gold OA journals' efforts to win over authors. This is regrettable, but a functional fact today, owing to the nature of OA and of the two means of providing it.

Is the effect symmetrical? Is gold OA likewise a rival to green OA? Here the answer is more complicated: No, an author who chooses gold OA (by publishing in an OA journal) is not at all a loss for green OA, because the article is nevertheless OA, and green OA's sole objective is 100% OA, as soon as possible, and nothing else. (Besides, a gold OA article too can be self-archived in the author's Institutional Repository if the author or institution wishes! All gold journals are, a fortiori, also green, in that they endorse author self-archiving.)

But there is a potential problem with gold from the standpoint of green. The problem is not with authors choosing gold. The problem is with gold publishers promoting gold as superior to green, or, worse, with gold publishers implying that green OA is not really OA, or not "fully" OA (along some imaginary OA "continuum").

"Free Access vs. Open Access" (thread started Aug 2003)

Why, you ask, would gold OA want to give the impression that green OA was not "really" OA or not "fully" OA? Because of the rivalry for authors that I just mentioned. The causal arrow is a one-way one insofar as competition for authors is concerned: green OA does not lose an author if that author publishes in a gold OA journal, whereas gold OA does lose an author if an author publishes in a green journal instead of a gold one. However, if gold portrays green as if it were not really or fully OA, and authors believe this, then it loses author momentum for green -- especially among that vast majority of authors who do not yet elect to publish gold. For there is today something still very paradoxical, indeed equivocal, about author behavior and motivation vis-a-vis OA:

Authors profess to want OA. Thirty-four thousand of them even signed the 2001 PLoS Open Letter threatening to boycott their journals if they did not provide (gold) OA (within 6 months of publication). (Most journals did not comply, and most authors did not follow through on their boycott threat: How could they? There were not enough suitable gold journals for them to switch to, and most authors clearly were not interested in switching journals, let alone paying for publication, then or now.)

Yet (and here comes the paradox): if those 34,000 signatories -- allegedly so desirous of OA as to be ready to boycott their journals if they did not provide it -- had simply gone on to self-archive all their papers, they would be well on the road to having the OA they allegedly desired so much! For the green road to 100% OA happens to be based on the (golden!) rule: Self-Archive Unto Others As You Would Have Them Self-Archive Unto You.

Why didn't (and don't) most authors do it (yet)? It is partly (let us state it quite frankly) straightforward foolishness and inconsistency on authors' part. They simply have not thought it through. This cannot be denied. Authors are in a state of self-induced "Zeno's Paralysis" regarding OA, from which FAQs have so far been powerless to free them -- so that it now looks as if self-archiving mandates from their institutions and/or their funders will be the only thing that can induce them to do what will give them what they so want and need.

But the confusion and inaction are partly also the fault of the promotional efforts of (well-meaning) OA advocates. Harold Varmus sent a mixed message with his 1999 "E-biomed" proposal (which led to PLoS, the PLoS Open letter, PubMed Central, Biomed Central, and eventually the PLoS and BMC fleet of OA journals, including PLoS Biology). Was E-biomed a gold proposal, a green proposal, both, or neither? The fact is that it was an incoherent proposal -- a confused and confusing mish-mash of central self-archiving, publishing reform/replacement and rival publishing -- and although it has undeniably led to genuine and valuable progress toward (what was eventually baptized by BOAI as) OA, it has left a continuing legacy of continuing confusion too.

And we are facing part of that legacy of confusion now, with PLoS thinking that the only way (or the best) to reach 100% OA is to publish and promote gold OA journals. That is why PLoS Biology agreed to referee the Eysenbach paper, which seemed to show that OA gold is the only one that increases citation impact, not green self-archiving, which is (when you come right down to it) not even "real" OA at all!

That is also why PLoS Biology editorialised that they found it "surprisingly hard to find" evidence -- "solid evidence" -- that OA articles are read and cited more. And that is why PLoS Biology was happy to make an exception and publish the Eysenbach study, even though scientometrics is not the subject matter of PLoS Biology, but (I'll warrant) PLoS Biology would not have been happy to advertise in its pages the fact that green OA self-archiving was enough to get articles read and cited more!

So green OA does have a bit of an uphill battle against gold OA and the subsidies and support it has received (because gold OA is an attractive and understandable idea, whereas green OA requires a few more mental steps to dope out -- though not many, as none of this is rocket science!).

But, to switch metaphors, the green road to 100% OA (sic) is far wider, faster and surer than the golden road. (Every article can be self-archived, today, and without their authors' having to renounce or switch journals, whereas most articles do not yet have a suitable OA journal to publish in today, even if their authors wished to switch journals, which most do not; and authors can be mandated to self-archive by their institutions and funders, but neither authors' choice of journals nor their publishers' choice of access-provision or cost-recovery model can be mandated by authors' institutions and funders.) Moreover, 100% OA really is beneficial to research and researchers; so the green road of self-archiving is bound to prevail, despite the extra obstacles. And the destination (100% OA) is exactly the same for both roads. (Indeed, I am pretty sure that even the fastest way to reach 100% gold OA -- i.e., not just 100% OA but also the conversion of all journals to gold -- is in fact to take the green road to 100% OA first.

So gold is doing itself a disservice when it tries to devalue green. Read on:

GE: " and this study (and the priority claims in the editorial) was talking about the gold OA end of the spectrum."

Spectrum? Continuum? Degrees of OA?

GE: " Publishing in an open access journal is a fundamentally different process from putting a paper published in a toll-access journal on the Internet. In analogy, printing something on a flyer and handing it out to pedestrians on the street, and publishing an article in a national newspaper can both be called "publishing", but they remain fundamentally different processes, with differences in impact, reach, etc. A study looking at the impact of publishing a newspaper can not be replaced with a study looking at the impact of handing out a flyer to pedestrians, even though both are about "publishing"."

Oh dear! I have a feeling Eysenbach is going to tell as that making a published journal article accessible online free for all by self-archiving it is not OA after all, or not "full OA". If the journal doesn't do it for you, and/or you don't pay for it, it's not the real thing.

I wonder why Eysenbach would want to say that? Could it be because he is promoting an OA (gold) journal (his own)? Could that also have been the reason the PLoS editorial was so sanguine about Eysenbach's findings on the OA gold advantage, and so dismissive of any prior evidence of an OA green advantage?

GE: " Finally, Harnad says that "prior evidence derived from substantially larger and broader-based samples showing substantially the same outcome". I rebut with two points here[:] Regarding "larger samples" I think rigor and quality (leading to internal validity) is more important than quantity (or sample size)."

Even when all within-journal studies -- large and small, approximate and exact -- just keep producing exactly the same outcome, every time (OA increases impact)?

GE: " Going through the laborious effort to extract article and author characteristics for a limited number of articles (n = 1492) in order to control for these confounders provides scientifically stronger evidence than doing a crude, unadjusted analysis of a huge number of online accessible vs non-online accessible articles, leaving open many alternative explanations."

As I said, for those who doubt the causality and think the OA advantage is just a self-selection bias, Eysenbach's study will not convince them otherwise either. For those with eyes to see, the repeated demonstrations, in field after field, of exactly the same effect on incomparably larger samples will already have been demonstration enough. For those with eyes only for gold, evidence that green enhances citations will never be "solid evidence."

If Eysenbach and the editors had portrayed the latest PLoS findings as they should have, namely, as yet another confirmation of the OA impact advantage, with some new details about its fine-tuning, I would have done had nothing but praise for it. But the actual self-interested spin and puffery that instead accompanied this work -- propagating the frankly false idea that this is the first "solid evidence" for the OA impact advantage, and, worse, that it implies that self-archiving itself does not deliver the OA impact advantage -- would have required not the lack of an ego, but the lack of any real fealty to OA itself to have been allowed to stand uncontested.

GE: " Secondly, contrary to what Harnad said, this study is NOT at all "showing substantially the same outcome". On the contrary, the effect of green-OA -- once controlled for confounders - was much less than what others have claimed in previous papers."

Let's be quite explicit about what, exactly, we are discussing here:

Eysenbach found that in a 6-month sample of 1492 articles in one 3-option journal (PNAS):

"While in the crude analysis self-archived papers had on average significantly more citations than non-self-archived papers (mean, 5.46 versus 4.66; Wilcoxon Z = 2.417; p = 0.02), these differences disappeared when stratified for journal OA status (p= 0.10 in the group of articles published originally as non-OA articles, and p = 0.25 in the group of articles published originally as OA).

"In a logistic regression model with backward elimination, which included original OA status and self-archiving OA status as separate independent variables as well as all potential confounders, self-archiving OA status did not remain a significant predictor for being cited. In a linear regression model, the influence of the covariate "article published originally as OA, without being self-archived" (beta = 0.250, p < 0.001) on citations remained stronger than self-archiving status (beta = 0.152, p = 0.02)."

To translate this into english (from an article with exceedingly user-unfriendly data-displays, by the way, making it next to impossible to extract and visualize results from the tables by inspection!): First, the numbers:

NOA (Not OA): (1159 articles 86.2% cited at least once)
POA (Payed OA only): (176 articles 94.3% cited at least once)
SOA (Self-Archived OA only): (121 articles 90.1% cited at least once)
BOA (POA and SOA): ( 36 articles 97.2% cited at least once)

The finding is that (in this PNAS sample, and with many other factors -- e.g., days since publication, number of authors, article type, country, funding, subject, etc. -- statistically isolated so as to be asessable independently): POA, SOA and BOA considered together, and PAO considered alone, all have significantly more citations than NOA; but SOA considered alone ("stratified") does not. Also, if considered jointly (multiple regression), both POA and SOA increase citations, but POA is the stronger effect.

Here are three simple hypotheses, in decreasing order of likelihood, as to why this small PNAS study may have found that the citation counts and their significance ordered themselves as they did: BOA>POA>>SOA>NOA

Hypothesis 1: The POA advantage might be unique to high-profile 3-option journals (POA, SOA, NOA) like PNAS (which are themselves a tiny minority among journals) and occurs because the POA articles are more visible than the SOA articles. (The POA + SOA = BOA articles do the best of all: redundancy enhances visibility.) So the POA authors do get something more for their money (but that something is not OA but high-profile POA in a high-profile journal) -- at least for the time being. This extra POA-over-SOA advantage will of course wash out as SOA and indexed, interoperable Institutional Repositories for self-archiving grow.

Hypothesis 2: The POA advantage might result at least in part from QB (self-selection Quality Bias) because the decision (by a self-selected 15% subset of PNAS authors) to pay for POA is influenced by the author's underlying sense of the potential importance (hence impact) of his article: Simply asking authors about how important they think their article is, and whether that influenced their decision to pick POA or SOA or NOA, and failing to detect any significant difference among the authors, does not settle this matter, and certainly not on the basis of such a small and special sample. (But I think QB is just one of many contributors to the OA citation advantage itself, and certainly not the only determinant or even the biggest one.)

Hypothesis 3: The POA advantage might be either a small-sample chance result or a temporary side-effect of the 3-option journals in early days: a one-stop shopping advantage for PNAS articles, in a high-profile store, today. It needs to be tested for replicability and representativeness in larger samples of articles, journals, and time-bases.

(Note that Lawrence's 2001 as well as Hajjem et al's 2005 finding had been that the proportion of OA articles increases in the higher citation ranges, being lowest among articles with 0-1 citations.)

Eysenbach finds that with logistic regression analysis separating the independent effects of POA, SOA and other correlates, SOA has no significant independent effect in his 1-year PNAS sample. Now let's test whether that replicates in larger samples, both in terms of number of articles, journals, and time-base. (Failure to find a significant effect in a small sample is far less compelling than success in finding a significant effect in a small sample!)

GE: " Harnad, a self-confessed "archivangalist", co-creator of a self-archiving platform, and an outspoken advocate of self-archiving (speaking of vested interests) calls the finding that self-archived articles are... cited less often than [gold] OA articles from the same journal "controversial". In my mind, the finding that the impact of nonOA < greenOA < goldOA < green+goldOA is intuitive and logical: The level of citations correlates with the level of openness and accessibility."

I don't dispute that POA can add more citations, just as BOA can; maybe self-archiving in 10 different places will add still more. But what does this imply, right now, practically speaking? And, even more important, how likely is it that this sort of redundancy will continue to confer significant citation advantages once a critical mass of the literature is in interoperable Institutional Repositories (green SOA) rather than few and far between, as now? It is indeed intuitive and logical that the baseline 15% of the literature as a whole that is being spontaneously self-archived somewhere, somehow on the Web, across all fields, has somewhat less visibility right now than the 15% of PNAS articles that PNAS is making OA for those authors who pay for it (POA). That's a one-stop shopping advantage for PNAS articles, against PNAS articles, in a high-profile store, today.

But the true measure of the SOA advantage today (at its 15% spontaneous baseline) is surely not to be found in PNAS but in the statistically far more numerous, hence far more representative full-spectrum of journals that do not yet offer POA. (I would be delighted if those journals took the Eysenbach findings as a reason for offering a POA option! But not at the expense of authors drawing the absurd conclusion -- not at all entailed by Eysenbach's PNAS-specific results -- that in the journals they currently publish in, SOA alone would not confer citation advantages at least as big as the ones we have been reporting.)

Regarding my self-confessed sin of archivanglizing, however, I do protest that my first and only allegiance is to 100% OA, and I evangelize the green road (and promote the self-archiving software) only because it is so resoundingly obvious that it is the fastest and surest road to 100% OA. (If empirical -- or logical -- evidence were ever to come out showing the contrary, I assure you I too would join the gold rush!)

GE: " Sometimes our egos stand in the way of reaching a larger common goal, and I hope Harnad and other sceptics respond with good science rather than with polemics and politics to these findings."

Well, first, let us not get carried away: There's precious little science involved here (apart from the science we are trying to provide Open Access to). The call to self-archive in order to enhance access and impact is so obvious and trivial that, as I noted, the puzzle is only why anyone would even have imagined otherwise.

But when it comes to polemics and politics (and possibly also egos), it might have kept things more objective if the results of Eysenbach's small but welcome study confirming the OA impact advantage had not been hyped with editorial salvos such as:

"solid evidence to support or refute... that papers freely available in a journal will be more often read and cited than those behind a subscription barrier... has been surprisingly hard to find..."
Or even the heavily-hedged:
"As far as we are aware, no other study has compared OA and non-OA articles from the same journal and controlled for so many potentially confounding factors."

GE: " Unfortunately, in this area a lot more people have strong opinions and beliefs than those having the skills, time, and willingness to do rigorous research. I hope we will change this, and I reiterate a "call for papers" in that area [http://www.jmir.org/2006/2/e8/]"
May I echo that call, adding only that the rigorous research might perhaps be better placed in a journal specializing in scientometrics and in rigorously peer-reviewing it, rather than in The Journal of Medical Internet Research, or even PLoS Biology.
Brody, T., Harnad, S. and Carr, L. (2005) Earlier Web Usage Statistics as Predictors of Later Citation Impact. Journal of the American Association for Information Science and Technology (JASIST) 56.

Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4) pp. 39-47.
I close with some replies to portions of another version of Eysenbach's response which appeared in his blog.
GE: " Harnad's point that the PLoS paper is about the "citation advantage of open access" and that there have been "previous papers about the citation advantage of open access" (mostly his own studies, mostly not published in peer-reviewed journals) is as meaningful as saying "this paper is about a cancer treatment, and there are previous papers about cancer treatments, so this one doesn't add anything"."
That's not what I said. I said this:
"[T]he only new knowledge from this small, journal-specific sample was (1) the welcome finding of how early the OA advantage can manifest itself, plus (2) some less clear findings about differences between first- and last-author OA practices, plus (3) a controversial finding that will most definitely need to be replicated on far larger samples in order to be credible: "The analysis revealed that self-archived articles are also cited less often than OA [sic] articles from the same journal."
And I do think all of this is as far away from rigorous oncological research as it is from rocket science!
GE: " The statement made by the reviewers and editors of the PLoS paper that this is the first study looking at the citation advantage of an open access/hybrid journal remains correct until somebody can show me a reference where this has been done before."
But who ever contested that far more modest and circumspect statement (which was certainly not the one the accompanying PLoS editorial made)? This is indeed "the first study looking at the citation advantage of an open access/hybrid journal"; indeed, it's the first such study of PNAS. But it's certainly not the first study looking at the citation advantage of OA in general, or OA self-archiving in particular, and looking at it within journals -- within many journals, and many articles.
GE: " In analogy, a small carefully designed cohort study showing a relationship between smoking and cancer with 1500 patients, obtaining through questionnaires and interviews additional variables which could account for the association and controlling for these confounders and still coming to the conclusion that there is a relation between smoking and cancer is scientifically stronger evidence than a quick-and-dirty uncontrolled cross-sectional study showing an association between smoking and cancer, even if this is done in a population of millions."
Indeed it would. And I forgot to add to my list (4) that Eysenbach had tested the hypothesis that the OA citation advantage is merely the result of a self-selection bias by asking 247 authors whether it was, and they replied that it wasn't...

Stevan Harnad
American Scientist Open Access Forum

Posted by Stevan Harnad in Publishing Reform at 21:59 | Comments (0) | Trackbacks (0)

« previous page (Page 105 of 113, totaling 1129 entries) » next page

Open Access Archivangelism

by Stevan Harnad

Friday, June 16. 2006

Metrics-Based Assessment of Published, Peer-Reviewed Research

Thursday, June 15. 2006

FRPAA and paying publishers to self-archive

Tuesday, June 13. 2006

"CURES" trump publisher revenue risks: Public READS do not

Student/Practitioner/Patient/Public (SPPP) Access Comes With the OA Territory

Sunday, June 11. 2006

How to Counter All Opposition to the FRPAA Self-Archiving Mandate

Saturday, June 10. 2006

Critique of American Association of Publishers' Critique of FRPAA Self-Archiving Mandate

Saturday, May 27. 2006

Plugging the Loopholes in the Proposed FRPAA, RCUK and EU Self-Archiving Mandates

Friday, May 26. 2006

The Epidemiology of OA

Wednesday, May 24. 2006

End of PLoS Exchange

Thursday, May 18. 2006

Confirming the Within-Journal OA Impact Advantage

EnablingOpenScholarship (EOS)

Federal Research Public Access Act (FRPAA)

Alliance for Taxpayer Access (ATA)

Creative Commons License:

Quicksearch

Syndicate This Blog

Materials You Are Invited To Use To Promote OA Self-Archiving:

Archives

Calendar

Categories

Blog Administration

Statistics

Top Referrers

Syndicate This Blog