Preserv

       
Latest...
Array
Preserv 2 final report 'candid and realistic'
The final report from the Preserv 2 project has been described by the JISC programme manager responsible for funding the project, Neil Grindley, as ”candid and realistic about the ... more
Project Partners

Oxford University Library Services ECS, University of Southampton The National Archives
Project Advisors
The British Library
Funded By
JISC

PRESERV 2 is funded by JISC within its capital programme in response to the September 06 call (Circular 04/06), Repositories and Preservation strand

PRESERV was originally funded by JISC within the 4/04 programme Supporting Digital Preservation and Asset Management in Institutions, theme 3: Institutional repository infrastructure development

MORE INFORMATION?

EMAIL: Steve Hitchcock, Project Manager

TEL: +44 (0)23 8059 3256
FAX: +44 (0)23 8059 2865

PRESERV Project,
IAM (Intelligence, Agents, Multimedia) Group,
Department of Electronics & Computer Science,
University of Southampton,
Highfield,
Southampton
SO17 1BJ, UK
RSS Admin


About the ProjectObjectives & OutcomesNews RSSPapers & Presentations RSSPeopleBlogs  RSS
Preservation Services
       

Preservation - Analyse

The analysis stage of active preservation has been one of the main focal areas of both Preserv1 and Preserv2 working with The National Archives (UK) and Oxford University to develop and integrate tools and registries designed to aid and assist digital preservation. This section asks questions about our digital objects, the properties of our objects and the tools available to manipulate these objects.

What is the type of the file, is the file valid?

To get an accurate indication of file format, Preserv believes that the file extension (the bit in the name after the ".") should not be relied upon. Sometimes it is simply not present. Instead proven tools should be used to analyse the contents of the file to determine its type. We can also verify that the contents conform to the format specification, if there is one. Such an example would include examining XML-based files, or compiling source code such as that used in LaTeX documents.

In both Preserv1 and Preserv2 we use a tool, DROID, produced by The National Archives (UK). This tool, which can be downloaded and run locally, uses a signature file available via the PRONOM registry to classify files and provide specific details relating to individual files. Each file is classified using a PRONOM unique identifier, enabling extra information to be obtained from the PRONOM registry about the format, more of which we shall explain later.

Preserv1 used DROID to classify the files in many repositories indexed by the Registry of Open Access Repositories (ROAR), presenting profiles for the repositories that could be classified in this way. An example Preserv profile from ROAR is shown below .

ROAR Preserv Profile
Preserv profile from ROAR

Although this approach provides a good breakdown of the file formats in the target repository, there are some problems with this system that we wanted to address in Preserv2. ROAR relies on harvesting tools to access remote repository content. This introduces bandwidth issues when downloading data, especially for large files. To limit bandwidth used by ROAR, and the repository, any files over 2Mb were not downloaded and could not be classified. It is more desirable to control this process within the repository software to ensure successful and complete classification. This step also goes some way to solving the 'no files found' problem that dominates the illustration above.

By linking DROID more closely with the repository software (as Preserv2 has been doing) we can ensure successful completion without limiting file size. The next illustration shows the interface to an EPrints repository where DROID has been run in the background and has classified all of the files in this repository. If it hadn't completed successfully this would be indicated in the risk scores box.
EPrints Format Classification
EPrints Format Classification Screen

Due to the modular nature of EPrints software, this uses a screen plug-in to read a single extra database record per file and display a summary of this information. Some EPrints-based functionality has been added to make the page readable, but as far as the classification is concerned the extra database field is populated using an import plug-in to parse the XML output from a vanilla DROID installation.

DROID need not be tied to the repository software, however, and can instead be used as part of a 'smart storage' approach.
<--- Preservation - Check 5/8 Is the file at risk? --->

This page produced and maintained by the PRESERV Project. Contact us