Summary: The findings of Eric Archambault’s (2013) pilot study on the percentage of OA that is currently available are very timely, welcome and promising. The study finds that the percentage of articles published in 2008 that are OA in 2013 is between 42-48%. It does not estimate, however, when in that 5-year interval the articles were made OA. Hence the study cannot indicate what percentage of articles being published in 2013 is being made OA in 2013. Nor can it indicate what percentage of articles published before 2013 is OA in 2013. The only way to find that out is through a separate analysis of immediate Gold OA, delayed Gold OA, immediate Green OA, and delayed Green OA, by discipline, by year publication year, by OA year.
“This paper re-assesses OA availability in 2008”
For papers that were published in 2008 -- but
when were those articles made available OA?
8% of all articles published in 2008 (or 1/5 of the
42% that were OA) were made OA via Gold, which means they were made available OA in 2008.
But
34% (or 4/5 of the
42% that were OA) were either Green or hybrid Gold or delayed Gold [it is not at all clear why these were all conflated in the analysis]:
Of this 4/5 of what was made OA, it is likely that
the largest portion was Green. But
when each article was made Green OA is unknown. Some was made Green OA immediately in 2008; but some was
delayed Green – potentially up to any point in the interval between 2008 and the date the sampling was done.
For the proportion of the 4/5 OA that was
delayed Gold OA (i.e., made OA by the subscription publisher after 6-12 months or longer) that delay has to be calculated. (
Bjork and Laakso found that the second largest portion of OA was delayed Gold. The names of the delayed Gold journals are known, and so are their delay periods.)
The portion of the 4/5 OA that was hybrid Gold was made Gold OA immediately in 2008 by the publisher, but hybrid Gold represents the smallest portion of the 4/5 OA. The names of the hybrid Gold journals are known. Whether the articles were hybrid Gold or Green needs to be ascertained and separate calculation need to be made.
Until all this is known, it is not known what proportion of 2008 articles was OA in 2008. The rest of the findings are not about OA at all, but about
embargoed access through some indeterminate delay between 2008 and 2013 (5 years!).
“the tipping point for OA has been reached and… one can expect that, from the late 2000s onwards, the majority of published academic peer-reviewed journal articles were available for free to end-users. ”
But when? What publication date was accessible as of what OA date for that publication? Otherwise this is not about OA (which means immediate online access) but about OA embargoes and delays. This is not what is meant by a “tipping point.”
“The paper presents the results for the pilot phase of a study that aims to estimate the proportion of peer-reviewed journal articles which are freely available, that is, OA for the last ten years (the pilot study is on OA availability in 2008). ”
This study seems to be on OA availability of 2008 articles, not OA availability in 2008. And availability somewhere within 10 years is not OA.
“An effective definition of OA for this study is the following: ‘OA, whether Green or Gold, is about giving people free access to peer-reviewed research journal articles. ”
(Free
online access. But to be OA, the access must be immediate, not delayed, and permanent, not temporary.)
“OA is rarely free”
This is not quite the point: It is
publication that is not free. Its costs must be paid for -- either via subscriptions, subsidies or publication charges.
Subscription fees cover publication costs by charging subscribers, for access. Author publication fees cover publication costs by charging authors, for
publication.
OA is simply toll-free online access, irrespective of whether publication is paid for via subscriptions, subsidies, or publication fees.
“Thus, the term toll-access, to distinguish the non-OA literature, is avoided here. ”
Toll access is the correct term when access must be paid for. This should not be conflated with how publication costs are paid for.
“The core use of Ulrich in this project was to calibrate the proportion of papers from each of 22 disciplines used to present disaggregated statistics. ”
But were the disciplines weighted also by their proportion of total annual article output?
(Treating disciplines as equal had been one of our mistakes, resulting in some discrepancies with Bjork & Laakso. When disciplines are properly weighted, our figures agree more with Bjork & Laakso’s. -- The remaining discrepancy is more challenging, and it is about the
uncertainty of when articles were made OA, in the case of Green: the publication date is not enough; nor the sampling date, unless it is in the same year! We are now conducting a study on publication date vs OA date, to estimate the proportion of immediate-Green vs the average latency of delayed Green compared to immediate Gold and delayed Gold.)
“When selecting journals to be included for an article-level database such as Scopus, deciding whether to include a journal has a direct impact on production costs and partly because of this, database publishers tend to have a bias towards larger journals”
True, but WoS and SCOPUS also have quality criteria, whereas Ulrichs does not. Ulrich can and does cover all.
“Despite a 50% increase in journal coverage, Scopus only has about 20% more articles. A sensitivity analysis was performed”
There is still the question of quality: Including more probably journals may mean lowering average quality.
And there is also the question of discipline size: Estimating overall and average %OA cannot treat 22 disciplines as equal if some publish much more articles than others.
“For gold articles, an estimate of the proportion of papers was made from the random sample by matching the journals that were known to be gold in 2008. ”
This solves the problem of the potential discrepancy between publication date and OA date for Gold OA -- but not for Green OA, of which there is about 3 times as much as Gold. Nor for delayed Gold OA.
“ [Articles] were selected by tossing the 100,000 a few more times using the rand() command in Excel, then proceeding to the selection of the required number of records. ”
This presumably provided the articles in the 22 disciplines, but were they equal or proportionate?
“A test was then conducted with 20,000 records being provided to the Steven Harnad team in Montreal. ”
But what was the test? To search for those 20,000 records on the web with our robot? And what was the date of this OA test (for articles published in 2008)?
“the team led by Harnad measured only 22% of OA in 2008 overall ‘out of the 12,500 journals indexed by Thomson Reuters using a robot that trawled the Web for OA full-texts’ (Gargouri et al., 2012) ”
Our own study was %OA for articles published in 2008 and indexed by WoS,
as sampled in 2011: The Archambault pilot study was conducted two years later, and on articles indexed by SCOPUS. There may have been more 2008 articles made OA two years later; and more indexed by SCOPUS than WoS.
It is also crucial to estimate both the %OA and the latency of the OA, in order to estimate the true annual %OA and also its annual growth rate (for both Green and delayed Gold). And it has to be balanced by discipline size, if it is to be a global average of total articles, rather than unweighted disciplines.
“a technique to measure the proportion of OA literature based on the Web of Science produces fairly low recall and seriously underestimates OA availability. ”
Agreed, if the objective is a measure of %OA based indiscriminately on total quantity of articles.
But the WoS/SCOPUS/Ulirichs differences could also be differences in quality -- and definitely
differences in the degree to which researchers need access to the journals in question. WoS includes all the "must have" (“core”) journals, and then some; SCOPUS still more; and Ulrichs still more. So these four layers and their %OA should be analyzed and interpreted separately too.
“This extensive analysis therefore suggests that 48% of the literature published in 2008 may be available for free. ”
Yes, but when were those 2008 articles made available free?
“one can infer that OA availability very likely passed the tipping point in 2008 (or earlier) and that the majority of peer-reviewed/scholarly papers published in journals in that year are now available for free in one form or another to end-users. ”
It's not clear what a "tipping point" is (50%?). And a tipping point for what: OA? Or eventual delayed OA after an N-year embargo?
What the pilot study’s result shows is not that OA reached the 50% point for 2008 in 2008! It reached the 50% point for 2008 somewhere between 2008 and when the sampling was done!
What we need to know now is how fast %OA per year
for that year (or the immediately preceding one) reaches 50%.
“These results suggest that using Scopus and an improved harvester ‘to trawl the Web for OA full-texts’ could yield substantially more accurate results than the methods used by Björk et al. and Harnad et al. ”
But why bundle hybrid and delayed Gold with Green, immediate and delayed? They are not at all the same thing!
Hybrid Gold is immediate Gold.
“Embargo” is ambiguous – it can be "delayed Gold," provided by the publisher after an embargo, or it can be embargoed Green, provided by the author after an embargo.
These mean different things for OA and need to be calculated separately.
For hybrid Gold and delayed Gold, the OA dates can be known exactly. For Green they cannot. This is a crucial difference, yet
Green is the biggest category.
“Pay-per-article OA, journals with embargo periods and journals allowing partial indexing following granting agencies’ OA policies are considered hybrid, and these data are bundled here with green OA (self-archiving). ”
"Journals with embargo periods" is ambiguous, because there are subscription journals that make their own articles free online after an embargo period (“delayed Gold”), and there are subscription journals that embargo how long before their authors can make their own papers Green OA.
And some authors do and some authors don't make their articles Green OA.
And some authors do and don't comply with the Green embargoes.
And some authors are mandated to make their articles Green OA by their funders or institutions.
And allowable embargo lengths vary from mandate to mandate.
And mandates are growing with time.
What is needed is separate analysis (by discipline, weighted) for Gold, hybrid Gold, Delayed Gold and Green. And Green in particular needs to be separately analyzed for immediate-Green and delayed Green.
Only such an analysis will give an estimate of the true extent and growth rate for immediate Gold, immediate Green, and delayed Gold and delayed Green (per 6-month increment, say), by discipline.
“It seems that the tipping point has been passed (OA availability over 50%) in Biology, Biomedical Research, Mathematics & Statistics, and General Science & Technology”
Much as I wish it were some I am afraid this is not yet true (or cannot be known on the basis of the results of this pilot study).
50% has only been reached for 2008 articles some time between 2008 and the time the study’s sample was collected. And the fields are of different size. And the dates are much surer for Gold, hybrid Gold than for delayed Gold and Green, both immediate and delayed.
“many previous studies might have included disembargoed papers and pay-per-article OA, which is not the case here”
But both hybrid Gold and Delayed Gold should be analyzed separately from Green because their respective OA dates is knowable. And as such, the results should be added to pure Gold, to estimate overall Gold OA, immediate and delayed.
Green OA, though bigger, has to have estimates of Green OA latency, by the field: i.e., the average delay between publication date and OA date, in order to estimate the percentage of immediate Green and various degrees of delayed Green.
“These data present the relative citation rate of OA publications overall, Gold OA and hybrid OA forms relative to publications in each discipline. ”
First, it has to be repeated that it is a mistake to lump together Green with Hybrid and Delayed Gold, for the reasons mentioned earlier (regarding date of publication and date of OA), but also because the Gold OA vs non-OA citation comparison (for pure Gold as well as Delayed Gold) is a
between journal comparison – making it hard to equate for content and quality -- whereas the Green OA vs non-OA citation comparison is a
within journal comparison (hence much more equivalent in content and quality).
(Hybrid Gold, in contrast, does allow within-journal comparisons, but the sample is very small and might also be biased in other ways.)
“many Gold journals are younger and smaller”
Yes, but even more important, many Gold OA journals are not of the same quality as non-OA journals. Journals are hard to equate for quality. That is why within-journal comparisons are more informative than between-journal comparisons for the citation advantage.
“Gold journals might provide an avenue for less mainstream, more revolutionary science. ”
Or for junk science (as you note): This speculative sword can cut both ways; but today it's just speculation.
“the ARC [citation impact] is not scale-invariant, and larger journals have an advantage as this measure is not corrected sufficiently for journal size”
This is another reason the OA citation advantage is better estimated via within-journal comparisons rather than between-journal comparisons.
“the examination of OA availability per country”
Again, country-differences would be much more informative if clearly separated by Gold, Hybrid Gold, Delayed Gold and Green, as well as by levels of journal quality, from WoS core, to rest of WoS, to SCOPUS, to Ulrichs.
“Finding that the tipping point has been reached in open access is certainly an important discovery”
If only it were sure!
By the way, "tipping point" is a pop expression, and it does not particularly mean 50%. It means something like: the point at which growth in a temporal process has become unstoppable in its trajectory toward 100%. This can occur well before 50% or even after. It requires other estimates rather than just one-off total percentages. It needs year to year growth curves. What we have here is the 50% point for 2008 papers (in some fields), reached some time between 2008 and today!
“This means that aggressive publishers such as Springer are likely to gain a lot in the redesigned landscape”
It is not at all clear how this pilot study finding of the 50% point for free access to 2008 articles (via Gold, and even more via Green OA) has now become a message about "aggressive" publishers (presumably regarding some form of Gold OA)? The finding is not primarily about Gold OA publishing!
“green OA only appears to move slowly, whereas Gold OA and hybrid toll before the process as opposed to toll after are in the fast lane”
It is even less clear how these results – concerning year 2008 articles, made OA some time between 2008 and now, about one third of them Gold OA and about 2/3 of them Green, with no year by year growth curves -- show that Green grows slowly and Gold is in the fast lane?
(There has probably indeed been a growth spurt in Gold in the past few years, most of it because of one huge Gold mega-journal, PLOS ONE: But how do the results of the present study support any conclusion on relative growth rates of Gold and Green? And especially given that Green growth depends on mandate growth, and Green mandates are indeed growing, with 20 new US major funding agencies mandating Green just this year [2-13]!)
“The market power will shift tremendously from the tens of thousands of buyers that publishers’ sales staff nurtured to the millions of researchers that will now make the atomistic decision of how best to spend their publication budget”
Where do all these market conjectures come from, in a study that has simply shown that 50% of 2008 articles are freely accessible online 5 years later, partly via Gold, but even more via Green?
Archambault, Eric (2013) The Tipping Point - Open Access Comes of AgeISSI 2013 Proceedings of 14th International Society of Scientometrics and Informetrics Conference, Vienna, Austria, 15-19 July 2013