Storage controllers: choosing between local disk and the network 'cloud'
Different types of storage services are available to repositories, from local
disks to the distributed 'cloud', offering choices of scale, bandwidth
and cost. Rather than adopt a single storage approach, with growing
data volumes and data types it is likely repositories will choose a
combination of services, or 'hybrid' storage. If there are storage
options, we have to manage copy and
transfer of content from the repository to the chosen locations.
Preserv has developed a
storage controller for EPrints software.
Extended abstract
From
the Desktop to the Cloud: Leveraging Hybrid Storage Architectures in
your Repository, updated April 2009, accepted for
Open Repositories Conference 2009 (OR09),
Atlanta, May
Introduces the EPrints storage controller, which allows repositories
using this software to integrate with emerging network, storage and
cloud services. For a less technical approach, see the furniture
removal analogy in this
blog
commentary (February 2009) to help you understand these
cloud-storage controller developments.
The EPrints storage controller has been
successfully
tested for storing content in Amazon S3/Cloudfront, and will be
implemented and available from
EPrints version 3.2 (availability tba).
Two more storage plug-ins are available so far for the
storage controller: the local storage plug-in that also supports the
legacy local disk layout, and a plug-in for the Sun STK5800 server.
Find out
how to write a storage plug-in.
Presentation
From open
storage to smart storage: enabling EPrints repository preservation
(slides),
Sun Preservation and Archiving Special Interest Group
(PASIG) meeting, May 2008
First description of the storage controller for EPrints repository
software. Supports a pluggable storage layer for repositories,
providing the ability to store objects in different locations based on a set of rules, e.g. using
metadata or type. For example, a generated thumbnail could be stored locally while the original image is stored in Amazon S3 and in a local archival server. Another example would be storing files of a certain size or classification offsite and sending these to a tape queue for backup.
Listing Fedora Commons (the repository software) as a storage service in the slide above led to some confusion initially. Using one repository software as a front-end (EPrints) to another (Fedora) offers intriguing possiblities, particularly in this case where the softwares have complementary strengths in terms of interfaces and data management. Keeping two repositories in sync when both are trying to perform similar operations could be tricky, however, but is possible when dealing with input of new items.
Alternative storage controllers
As part of Fedora Commons, the
Akubra project is implementing a plug-in based storage abstraction.
DuraSpace, a joint DSpace/Fedora project, seeks to offer a commercial service to mediate between the respective repository softwares and storage services:
"'DuraSpace' (is) a new web-based
service that will allow institutions to easily distribute content to
multiple storage providers, both 'cloud-based' and institution-based.
The idea behind DuraSpace is to provide a trusted, value-added service
layer to augment the capabilities of generic storage providers by making
stored digital content more durable, manageable, accessible and
sharable." Full
DSpace press release.
For a perspective on early progress with Duraspace from a Preserv team leader, see this
blog entry (25 February 2009) applauding the concept of durability for repository content and storage flexibility, but cautious about offering too many new services: "Let's do it in the cloud - but lets work really hard at articulating the benefits that the cloud end user will enjoy and stop relying on general talk about value-added services. I think researchers/end-users will forgive us for not having finished implementing something yet, but they won't forgive us for a lack of imagination."