Home | You are at

The embryology of the publication process

Last updated August 31 2000 10:09:17.

The scientific process of developing and publishing papers is as old as modern science. The traditional model consists of an author producing a paper from their research. This paper is then submitted to a journal, or journals, who decide whether the paper is appropriate. If the paper is accepted the paper is then passed to a peer-panel of reviewers, who review and suggest improvements to the paper. These suggestions are returned the author to implement. Once a final version is completed it is "published", to be forever set in stone upon the shelves of University libraries. Once published the article may be read by peers who decide to cite the article in their own papers, therefore generating "impact" for the author and their journal.

This embryological process can last upwards of a year or more, from the first report or pre-print article to published, peer-reviewed post-print.

Into this world comes the digital archive, allowing authors to deposit their articles and be read by the target audience within a few days, effectively "jumping the queue" of the print process. This provides us with a "dual-view" on the world of the research paper, one of the pre-print article, and one of the peer-reviewed post-print article. By comparing these two worlds we can see the time difference between the leading and trailing edges of research, between the pre-print and the post-print.

We can analyse this time difference within the arXiv archive, by looking at the lifecycle of the article meta-data.

Along with each arXiv pre-print article there is a set of "meta-data" - textual information about the article. One element of this meta-data is the publication information for the article, i.e. in what journal issue the paper was published. If we assume the generalised behaviour that authors deposit their pre-print articles in the archive at a similar time to when they submit their articles to a journal, we can see retrospectively a gap between when papers without the publication information are being deposited, and when the papers receive their publication data.

As with all self-archiving arXiv relies on authors to maintain their meta-data and papers, however in the special case of the hep-* (High Energy Physics) areas of the archive the meta data is updated by the SLAC/SPIRES service (Stanford University Library). This data is obtained directly from physics journals. This needs to a relatively high accuracy of meta data within the HEP areas.

Using the meta data from the papers in arXiv we can create a retrospective view on the state of articles in the archive, with the most recent articles only just been deposited - and therefore without journal publication data.

The y-axis is the number of papers deposited in that month, for each type of present meta-data. The x-axis is the time scale, from 03/2000 through to 07/1991 (start of the archive).

We can see, when looking at these graphs, that the lines of papers with journal publication data and the lines of papers being deposited in the archive cross at approximately a year old. This suggests that a typical time gap between an article being submitted to a publisher and the article being published is around 12 months.