EPrints Honeycomb
From PreservWiki
We know the Honeycomb metadata manager is not the best however the idea here is to set in stone what is the minimal amount of metadata to allow for a working multirepository honeycomb which can be easily queried in a way such that an entire repository can be rebuilt from the data in the Honeycomb.
Contents |
Honeycomb built in metadata
- system.object_hash_alg : sha1
- system.object_size : 28298
- system.object_layoutMapId : 3793
- I believe this is the distribution map of the data bits across the nodes.
- system.object_hash : 0795cb987ccdc145d4c34722e88b2972672d5ec0
- system.object_ctime : 1219419767074
- Submission timestamp (with milliseconds)
EPrints Required Metadata
- dc.isPartOf
- Often the repository ID. This is sufficient as an identifier to which repository the data originated from.
- dc.identifier
- The id of the object, again this is often just a URL from eprints but will be a sufficient identifier.
- dc.format
- The format or mime type of the object
- dc.conformsTo
- Namespace of the schema this object conforms to, used for identification of specific object types.
- eprints.revision
- The revision number of this object as there may exist 2 objects with the same dc.identifier but there can never be 2 with the same id and revision number.
Performing a Rebuild
- Find all files which conform to the EPXML schema type by dc.conformsTo which are dc.isPartOf your repository ID.
The Schema
This is the current schema in the Honeycomb. NOTE: The limited field lengths!
<metadataConfig> <schema> <namespace name="eprints" writable="false" extensible="false"> <field name="revision" type="long" queryable="true" /> </namespace> <namespace name="dc" writable="false" extensible="false"> <field name="conformsTo" type="string" length="512" queryable="true" /> <field name="identifier" type="string" length="512" queryable="true" /> <field name="isPartOf" type="string" length="512" queryable="true" /> <field name="format" type="string" length="512" queryable="true" /> </namespace> </schema> <fsViews> <fsView name="byDCisPartOf" filename="${dc.isPartOf}.${dc.identifier}" namespace="eprints" fsattrs="true" readonly="true"> <attribute name="dc.isPartOf"/> <attribute name="dc.identifier"/> </fsView> <fsView name="byST5800SystemId" filename="${system.object_id}" fsattrs="false" readonly="true"> <attribute name="system.object_id" unset=""/> <attribute name="system.object_ctime" unset=""/> </fsView> </fsViews> <tables> <_table name="dc" > <column name="dc.isPartOf"/> <column name="dc.identifier"/> <column name="dc.conformsTo"/> <column name="dc.format"/> </_table> <_table name="eprints" > <column name="eprints.revision"/> </_table> </tables> </metadataConfig>