Preserv

       
Latest...
Array
Preserv 2 final report 'candid and realistic'
The final report from the Preserv 2 project has been described by the JISC programme manager responsible for funding the project, Neil Grindley, as ”candid and realistic about the ... more
Project Partners

Oxford University Library Services ECS, University of Southampton The National Archives
Project Advisors
The British Library
Funded By
JISC

PRESERV 2 is funded by JISC within its capital programme in response to the September 06 call (Circular 04/06), Repositories and Preservation strand

PRESERV was originally funded by JISC within the 4/04 programme Supporting Digital Preservation and Asset Management in Institutions, theme 3: Institutional repository infrastructure development

MORE INFORMATION?

EMAIL: Steve Hitchcock, Project Manager

TEL: +44 (0)23 8059 3256
FAX: +44 (0)23 8059 2865

PRESERV Project,
IAM (Intelligence, Agents, Multimedia) Group,
Department of Electronics & Computer Science,
University of Southampton,
Highfield,
Southampton
SO17 1BJ, UK
RSS Admin


About the ProjectObjectives & OutcomesNews RSSPapers & Presentations RSSPeopleBlogs  RSS
Preservation Services
       

Testing format classification

Classification test results for a 'typical' repository (pie chart) and 12 individual IRs (example format classification for one repository shown below)

Sample format classification test results from one repository

Summary of test results (wiki page) Pie chart of format breakdown for 1000 file 'typical' repository
  • The Preserv2 EPrints toolkit was found to be fully working (after some minor bug fixes).
  • Using DROID, 146 files could not be classified from a full dataset of 2144 files, a 93.1% classification rate. Including wrongly classified files, this rate comes down to 92.75%.
  • Simple examination of file extension will classify an estimated 99.8% of the files correctly (if the extension is correct, not tested for all files but tested on the fringe cases). However, this does not provide any information about the file version.
  • PRONOM lacks a complete set of MIME-types for its format data. This matters for files where the classification may have changed between inspections. Such changes could indicate files at risk and prompt investigation. Comparing PRONOM-ROAR classification with that by Preserv 2 suggests that over a quarter of the files in the 1000-item 'typical' repository dataset have changed classification. When MIME-types are applied, however, it was found that 256 files match by MIME-type and only 40 files changed classification.
  • The Preserv2-EPrints-Toolkit version of DROID is able to classify more objects than the PRONOM-ROAR classification, cutting down on the number of unknown files, primarily by virtue of deploying a more recent version of the DROID signature file (v13 against v12 used in PRONOM-ROAR).
Conclusions from testing in full (wiki page)

See also Preserv's earlier, Web-registry based approach to format classification
Paper PRONOM-ROAR: Adding Format Profiles to a Repository Registry to Inform Preservation Services, International Journal of Digital Curation, Vol. 2, No. 2, November 2007
Explains why and how the National Archives' format identification tool (PRONOM) was applied to a repository registry (ROAR) to provide format profiles for the content in hundreds of repositories.
<--- Active Preservation 2/8 Format risk analysis: a preliminary interface implementation --->

This page produced and maintained by the PRESERV Project. Contact us