The effect of open access mandates on repository preservation
policy
Steve Hitchcock
Preserv 2 Project, IAM Group, School of
Electronics and Computer Science, University of Southampton, SO17 1BJ,
UK
Email: sh94r@ecs.soton.ac.uk
Preserv 2 is a JISC-funded project. Find out
more about Preserv 2.
Version information: This is a DRAFT paper, 10 March 2009
For reference, the summary notes
on which this report is based are
available.
For analysis, a shorthand spreadsheet
summary of these findings is
available.
Summary: Previous surveys have
found little or no evidence of institutional or repository preservation
policy.
For institutional repositories one area of policy-making that
has achieved some prominence is open access mandates. For this
follow-up survey we examined a registry of such policies (ROARMAP) for
evidence of policies leading towards preservation based on the thesis
that
preservation policy is more likely to follow from high-level policy
initiatives, and these initiatives are more likely to be found for
repositories with OA mandates. This was not strongly borne out in
practice. Just 28% of all policies posted on ROARMAP make any reference
to preservation. ROARMAP includes mandate policies by research funders
as well as institutions. Research funder policies are 2.4 times more
likely to seek provision for preservation in return for the requirement
to deposit papers than are mandate policies from institutions.
Introduction
Preservation begins with policy. Taking a longer-term view on
managing digital data involves many decisions and costs, and these
can't be assessed consistently or coherently without reference to, or
guidance by, a
properly formulated policy. But policy does not begin with preservation.
For digital repositories, their readiness for preservation can be
guaged by evaluating the extent, or not, of their policies, and
extrapolating these for preservation implications. In June 2006 the
Preserv project examined the sites of over 40 selected repositories,
and found none that had a public preservation policy. In a follow-up
survey
it asked these repositories if they had preservation policies:
21 replied and, indeed, none could point to a formal policy, although
it was found that de facto
preservation decisions were being taken on issues such as acceptable
file formats.
The lack of policy examples identified by this survey was reinforced
more recently at an institutional level by the Digital
Preservation Policies Study (Beagrie et al. 2008): The starting
premise for this study was "The lack of preservation policies and as a
result the lack of consideration of digital preservation issues in
other institutional strategies is seen as a major stumbling block". In
its findings the study referred to
the incidence of institutional preservation policies as 'sporadic',
'rarely considered' and 'only a handful of examples to follow',
confirming there is a serious gap in policy.
For institutional repositories one area of policy-making has achieved
some prominence: open access mandates. The aim of these
policies is to promote and mandate the deposit, mostly by authors, of
published research papers in repositories where these works can be
accessed freely and openly by all. These mandates are both advocated
and
assiduously recorded by Stevan Harnad in ROARMAP
(Registry of Open Access Repository Material Archiving Policies),
alongside the mandate policies of research funders. Juliet, from Sherpa,
records the OA policies of research funders but not of institutional
repositories, and where relevant indicates whether data archiving is
required. On March 9, 2009, 12 funders (from 46) were found to have
some
sort of data archiving policy according to Juliet.
For this small follow-up survey we decided to examine ROARMAP for
evidence of policies leading to preservation. Anticipating this survey,
the plan for the Preserv 2 project cited some of the first mandate
proposals or policies from
four research funders: the European Commission, Research Councils UK,
the Wellcome Trust and the National Institutes of Health (NIH) in the
USA. Each expressed
some concern over long-term preservation.
The simple thesis to be explored is that preservation policy is more
likely to follow from high-level policy initiatives, and these
initiatives are more likely to be found for repositories with OA
mandates. The basis for the thesis is that an OA mandate places a
demand on authors of papers, and reciprocation may be in the form of
services or benefits for these authors. One such benefit, recognised by
most authors who have posted digital content to non-managed Web sites,
might be seen as longer-term data management, or preservation.
Institutions can be expected to have the longevity, if not yet the
commitment or the data
management provision, to provide some assurance in this respect.
There are factors to consider here. OA papers, the target of the
mandates, are by definition published in peer reviewed sources such as
journals, and as such the preservation of the journal version of record
could be considered
the responsibility of the publisher rather than the repository. This
has led to publishers making arrangements with specialist organsiations
such as Portico in the USA and the Koninklijke Bibliotheek in the
Netherlands for the preservation of their digital journals. Digital
journal site licences have led to institutional initiatives such as
LOCKSS to ensure continued access to content. An
alternative view is that repositories manage versions of OA papers, and
that by requiring deposit for records-keeping, reporting, assessment
and other
purposes, the institution has a duty to manage the data it collects to
protect the effectiveness of these processes. Both are viewpoints to be
weighed in policy development.
Rather than evaluating the
strength and formality, far less the effectiveness, of any clauses on
preservation found in ROARMAP policies, as a starting point we simply
looked for any
reference to, or use of, the word 'preservation' or derivatives in the
policy statement.
Results
At the time the survey was concluded,
ROARMAP contained 88 policies
deemed to be of substance. This was not all of the policies
listed.
ROARMAP allows records to be added by repository representatives, and
some entries could be considered premature. A list of policies noting
provisions influencing preservation can be found here. The factors
noted were:
- the nature of the funder or institution
- the strength of the wording of the mandate policy, notably the
word used to instruct submission of papers
- the locus of the mandate, typically an institutional or subject
repository, such as PubMed Central of arXiv, or both
- use of the stem preserv- in the policy
- the availability of a link to the full policy, the location of
the policy and the 'officialness' of the linked copy
Link to policy
It was assumed that ROARMAP would include a summary of policies with
pointers to full, official versions posted on the institutional or
funder sites.
After all, these are the accepted policies of these institutions and
funders, and are aimed at researchers and authors at those
institutions.
One
surprising feature of ROARMAP, therefore, is that a quarter of the
policies, or policy summaries, have no links to the official site of
the mandater.
One
explanation might be that submitters believed ROARMAP to be the
authoritative source. There might also be the suspicion, however, that
such policies may not yet be fully endorsed by the institutions in
whose name they have been posted.
Reference to preservation
It is invariably the case that for policies, or summaries, where the
only instance is on ROARMAP, there is no mention of any term related to
preservation. The effect of such policies on preservation must be
assumed unknown until they are effected by the policy-making site.
Of the remaining policies, with accessible statements, the smaller
proportion 25/88 (28%) used the stem preserv- in some context, leaving
41/88 (47%) with no mention of preservation in the formal
policy. If we were to discount policies that don't have an official
site, the respective figures would be 38% and 62%.
Institutional vs funder policies
Among the 88 policies, 47 were set by institutions, 41 by research
funders. Of these, 8 institutional policies (i.e. 17% of such policies)
made reference to preservation, and 17 funder policies (41%). In other
words, it appears that funder
policies are 2.4 times more likely to consider preservation.
This might not be too surprising. The four cited policies that
motivated the study were all funder policies.
Locus of deposit
Institutional policies almost inevitably specify the institutional
repository as the locus of deposit. This is not the case for funders,
which usually have a subject-based research focus, and have thus tended
to favour subject repositories, notably the PubMed Central repository
in biomedical fields. One of the earliest mandates, by the Wellcome
Trust, set up PubMed Central UK to serve as the locus of deposit for
its policy. More recently funder policies have become more open,
directing deposit to either institutional or subject repositories.
Of the 23 policies that specified both
types of repository for deposit, 9 (39%) referred to preservation in
the policy
Nine policies allowed subject
repositories only, with 3
(33%) mentioning preservation
42 IR-only policies included 11 (26%) with reference to preservation
Again, the influence of the funder policies towards preservation is
evident in these results.
Strength of the mandate
A key phrase in a mandate policy is the strength of the demand for
authors to act on its provisions. The types of words typically found
here vary from (1) 'must', 'required' and 'mandatory' to (2)
'encourage', 'recommends' and 'requests'. It is believed that the
adoption of these terms impacts on the level of adoption of the policy,
that is, the number of papers deposited and the proportion of these
papers of the output of all papers by researchers at an institution,
with the stronger words in (1) more likely to be successful than those
in (2). Does the strength of the policy indicate it is more likely to
be accompanied by a preservation clause?
We find that the weaker terms are
used by 14 policies, of which three (21%) refer to preservation.
This is slightly less than the overall finding that 28% of policies
refer to preservation, suggesting that the more strongly-worded
policies are more likely to engage with preservation, although the
difference in the effect is much smaller than that found between funder
and institutional policies.
Geographical effect, by country or
region
Policies by country or region that have the highest incidence of
preservation references are: pan-European (5/6, 83%), Ireland (3/4,
75%). Inevitably for countries with higher numbers of policies we find
these tending more towards the overall average: the UK has 23 policies,
with 6 (26%) referring to preservation, and the USA with 3/12 (25%).
Again the most important factor is the relative number of funder to
institutional policies. By definition a pan-European policy is likely
to be cross-institutional and therefore counted predominantly as funder
policies. All Irish policies to date are funder-based.
What preservation are repositories
doing?
It is important to repeat that what was measured is simple reference to
preservation in policies, by the presence of a stem term preserv-. This
is far from doing preservation, or even a commitment to preservation.
Rather it's an ideal, an idea or a principle. On inspection it can be
seen that some policies are merely referring to other examples of
preservation policies. For funder policies the reference typically
concerns a requirement rather than a practice, or pointers to an
exemplar (e.g. "Suitable repositories should make provision for
long-term preservation", "PubMed central seeks to preserve
and maintain unrestricted access"). One institutional repository
provides a paragraph on the concept of trusted repositories, but at no
point refers to that repository.
Conclusions
The opening thesis, that preservation policy is more likely to be found
for repositories with OA mandates, is not strongly supported. Just 28%
of all policies posted on ROARMAP make any reference to preservation. A
quarter of policies on ROARMAP have no official link to an official
site and make no provision for preservation in summary form.
Currently the most important factor determining if an OA mandate policy
makes reference to preservation is whether it originates from a
research funder or an institution, as the former are 2.4 times more
likely to seek provision for preservation in return for the requirement
to deposit papers.
The intent of OA mandates is to promote access, not preservation. There
might be an expectation that such a demand for the deposit of content
should be reciprocated by the mandater. The mandate takes the form 'you
are required to deposit' or 'we encourage you to deposit', but then
what? Again, this could take many forms, not just preservation. There
is a suggestion, however, in the low incidence of references to
preservation, especially by institutions, that mandates are currently
one-way. The most effective mandates will be those that can show they
offer something positive in return to depositors.