The effect of open access mandates on repository preservation policy

Steve Hitchcock

Preserv 2 Project, IAM Group, School of Electronics and Computer Science, University of Southampton, SO17 1BJ, UK
Email: sh94r@ecs.soton.ac.uk

Preserv 2 is a JISC-funded project. Find out more about Preserv 2.
Version information: This is a DRAFT paper, 10 March 2009

For reference, the summary notes on which this report is based are available.
For analysis, a shorthand spreadsheet summary of these findings is available.


Summary: Previous surveys have found little or no evidence of institutional or repository preservation policy. For institutional repositories one area of policy-making that has achieved some prominence is open access mandates. For this follow-up survey we examined a registry of such policies (ROARMAP) for evidence of policies leading towards preservation based on the thesis that preservation policy is more likely to follow from high-level policy initiatives, and these initiatives are more likely to be found for repositories with OA mandates. This was not strongly borne out in practice. Just 28% of all policies posted on ROARMAP make any reference to preservation. ROARMAP includes mandate policies by research funders as well as institutions. Research funder policies are 2.4 times more likely to seek provision for preservation in return for the requirement to deposit papers than are mandate policies from institutions.

Introduction

Preservation begins with policy. Taking a longer-term view on managing digital data involves many decisions and costs, and these can't be assessed consistently or coherently without reference to, or guidance by, a properly formulated policy. But policy does not begin with preservation.

For digital repositories, their readiness for preservation can be guaged by evaluating the extent, or not, of their policies, and extrapolating these for preservation implications. In June 2006 the Preserv project examined the sites of over 40 selected repositories, and found none that had a public preservation policy. In a follow-up survey it asked these repositories if they had preservation policies: 21 replied and, indeed, none could point to a formal policy, although it was found that de facto preservation decisions were being taken on issues such as acceptable file formats.

The lack of policy examples identified by this survey was reinforced more recently at an institutional level by the Digital Preservation Policies Study (Beagrie et al. 2008): The starting premise for this study was "The lack of preservation policies and as a result the lack of consideration of digital preservation issues in other institutional strategies is seen as a major stumbling block". In its findings the study referred to the incidence of institutional preservation policies as 'sporadic', 'rarely considered' and 'only a handful of examples to follow', confirming there is a serious gap in policy.

For institutional repositories one area of policy-making has achieved some prominence: open access mandates. The aim of these policies is to promote and mandate the deposit, mostly by authors, of published research papers in repositories where these works can be accessed freely and openly by all. These mandates are both advocated and assiduously recorded by Stevan Harnad in ROARMAP (Registry of Open Access Repository Material Archiving Policies), alongside the mandate policies of research funders. Juliet, from Sherpa, records the OA policies of research funders but not of institutional repositories, and where relevant indicates whether data archiving is required. On March 9, 2009, 12 funders (from 46) were found to have some sort of data archiving policy according to Juliet.

For this small follow-up survey we decided to examine ROARMAP for evidence of policies leading to preservation. Anticipating this survey, the plan for the Preserv 2 project cited some of the first mandate proposals or policies from four research funders: the European Commission, Research Councils UK, the Wellcome Trust and the National Institutes of Health (NIH) in the USA. Each expressed some concern over long-term preservation.

The simple thesis to be explored is that preservation policy is more likely to follow from high-level policy initiatives, and these initiatives are more likely to be found for repositories with OA mandates. The basis for the thesis is that an OA mandate places a demand on authors of papers, and reciprocation may be in the form of services or benefits for these authors. One such benefit, recognised by most authors who have posted digital content to non-managed Web sites, might be seen as longer-term data management, or preservation. Institutions can be expected to have the longevity, if not yet the commitment or the data management provision, to provide some assurance in this respect.

There are factors to consider here. OA papers, the target of the mandates, are by definition published in peer reviewed sources such as journals, and as such the preservation of the journal version of record could be considered the responsibility of the publisher rather than the repository. This has led to publishers making arrangements with specialist organsiations such as Portico in the USA and the Koninklijke Bibliotheek in the Netherlands for the preservation of their digital journals. Digital journal site licences have led to institutional initiatives such as LOCKSS to ensure continued access to content. An alternative view is that repositories manage versions of OA papers, and that by requiring deposit for records-keeping, reporting, assessment and other purposes, the institution has a duty to manage the data it collects to protect the effectiveness of these processes. Both are viewpoints to be weighed in policy development.

Rather than evaluating the strength and formality, far less the effectiveness, of any clauses on preservation found in ROARMAP policies, as a starting point we simply looked for any reference to, or use of, the word 'preservation' or derivatives in the policy statement.

Results

At the time the survey was concluded, ROARMAP contained 88 policies deemed to be of substance. This was not all of the policies listed. ROARMAP allows records to be added by repository representatives, and some entries could be considered premature. A list of policies noting provisions influencing preservation can be found here. The factors noted were:
Link to policy
It was assumed that ROARMAP would include a summary of policies with pointers to full, official versions posted on the institutional or funder sites. After all, these are the accepted policies of these institutions and funders, and are aimed at researchers and authors at those institutions. One surprising feature of ROARMAP, therefore, is that a quarter of the policies, or policy summaries, have no links to the official site of the mandater. One explanation might be that submitters believed ROARMAP to be the authoritative source. There might also be the suspicion, however, that such policies may not yet be fully endorsed by the institutions in whose name they have been posted.

Reference to preservation
It is invariably the case that for policies, or summaries, where the only instance is on ROARMAP, there is no mention of any term related to preservation. The effect of such policies on preservation must be assumed unknown until they are effected by the policy-making site.

Of the remaining policies, with accessible statements, the smaller proportion 25/88 (28%) used the stem preserv- in some context, leaving 41/88 (47%) with no mention of preservation in the formal policy. If we were to discount policies that don't have an official site, the respective figures would be 38% and 62%.

Institutional vs funder policies
Among the 88 policies, 47 were set by institutions, 41 by research funders. Of these, 8 institutional policies (i.e. 17% of such policies) made reference to preservation, and 17 funder policies (41%). In other words, it appears that funder policies are 2.4 times more likely to consider preservation. This might not be too surprising. The four cited policies that motivated the study were all funder policies.

Locus of deposit
Institutional policies almost inevitably specify the institutional repository as the locus of deposit. This is not the case for funders, which usually have a subject-based research focus, and have thus tended to favour subject repositories, notably the PubMed Central repository in biomedical fields. One of the earliest mandates, by the Wellcome Trust, set up PubMed Central UK to serve as the locus of deposit for its policy. More recently funder policies have become more open, directing deposit to either institutional or subject repositories.

Of the 23 policies that specified both types of repository for deposit, 9 (39%) referred to preservation in the policy
Nine policies allowed subject repositories only, with 3 (33%) mentioning preservation
42 IR-only policies included 11 (26%) with reference to preservation

Again, the influence of the funder policies towards preservation is evident in these results.

Strength of the mandate
A key phrase in a mandate policy is the strength of the demand for authors to act on its provisions. The types of words typically found here vary from (1) 'must', 'required' and 'mandatory' to (2) 'encourage', 'recommends' and 'requests'. It is believed that the adoption of these terms impacts on the level of adoption of the policy, that is, the number of papers deposited and the proportion of these papers of the output of all papers by researchers at an institution, with the stronger words in (1) more likely to be successful than those in (2). Does the strength of the policy indicate it is more likely to be accompanied by a preservation clause?

We find that the weaker terms are used by 14 policies, of which three (21%) refer to preservation. This is slightly less than the overall finding that 28% of policies refer to preservation, suggesting that the more strongly-worded policies are more likely to engage with preservation, although the difference in the effect is much smaller than that found between funder and institutional policies.

Geographical effect, by country or region
Policies by country or region that have the highest incidence of preservation references are: pan-European (5/6, 83%), Ireland (3/4, 75%). Inevitably for countries with higher numbers of policies we find these tending more towards the overall average: the UK has 23 policies, with 6 (26%) referring to preservation, and the USA with 3/12 (25%). Again the most important factor is the relative number of funder to institutional policies. By definition a pan-European policy is likely to be cross-institutional and therefore counted predominantly as funder policies. All Irish policies to date are funder-based.

What preservation are repositories doing?
It is important to repeat that what was measured is simple reference to preservation in policies, by the presence of a stem term preserv-. This is far from doing preservation, or even a commitment to preservation. Rather it's an ideal, an idea or a principle. On inspection it can be seen that some policies are merely referring to other examples of preservation policies. For funder policies the reference typically concerns a requirement rather than a practice, or pointers to an exemplar (e.g. "Suitable repositories should make provision for long-term preservation", "PubMed central seeks to preserve and maintain unrestricted access"). One institutional repository provides a paragraph on the concept of trusted repositories, but at no point refers to that repository.

Conclusions

The opening thesis, that preservation policy is more likely to be found for repositories with OA mandates, is not strongly supported. Just 28% of all policies posted on ROARMAP make any reference to preservation. A quarter of policies on ROARMAP have no official link to an official site and make no provision for preservation in summary form. Currently the most important factor determining if an OA mandate policy makes reference to preservation is whether it originates from a research funder or an institution, as the former are 2.4 times more likely to seek provision for preservation in return for the requirement to deposit papers.

The intent of OA mandates is to promote access, not preservation. There might be an expectation that such a demand for the deposit of content should be reciprocated by the mandater. The mandate takes the form 'you are required to deposit' or 'we encourage you to deposit', but then what? Again, this could take many forms, not just preservation. There is a suggestion, however, in the low incidence of references to preservation, especially by institutions, that mandates are currently one-way. The most effective mandates will be those that can show they offer something positive in return to depositors.