Metadata Importer

From EChase
Jump to: navigation, search

Some initial design ideas[edit]

<Martin> so, have you had any thoughts on how we can do things like map things like... The author field from alinari into the actor table, which is then linked to the attribution table and thus the information_carrier

<Adrian> 1. I would populate the actor table first as if it were a controlled list and assign each an id. 2. Iterate on each Alinari record and create an information carrier record (prob in memory) and assign it an id 3. process each piece of data in turn. i.e extract the title from the Alinari data and populate the title field in the information_carrier table, do the same for description etc. 4. then move down the hierarchy and process sub tables, so you could create an attribution event and populate as much as you can, e.g. extract the author name, do a lookup from the actor table for its id and ref it.

Basically i would iterate at the root record/info_carrier level and then traverse down the hierarchy. You probably need to keep the record you are processing in memory as a context object so you can get hold of its id when required

I think we need some manager/controlling code that can keep the context of the current operations

We should design it with some general interfaces so we can write pluggable java classes to map new metadata as we expand the schema

Also, we should possibly consider writing the importer with an event based model, i.e each time you process a piece of data you could fire an event, this could be used to the indexer to index the data as it is imported. e.g have a processingText() event, or a processingDate(), or processingDescription() etc. the indexer could subscribe to these events and index the data as the records are processed

e.g there may be an event processingText(String text, int information_carrier_id) which the indexer will subscribe to. When the event is fired it could tokenise and index the text and mark it against the information_carrier_id in the index

In the system architecture diagram you will see input to the importer (schema description, and translation template). The idea is that the schema description describes how, for example, the alinary data is structured. The translation template should describe what processing is necessary to translate the data and put it into the unified metadata database.

e.g. process column "place", map it against the location/place thesaurus process column "title", translate it from italian to english etc. etc.