Metadata database schema

From EChase
Jump to: navigation, search

Semantic relations[edit]

Deagostini's schema features some support for thesauri, which is valuable, but it unless I'm mistaken, it only seems to support items from thesauri as keywords.

For example, it is possible to say:
Photograph -> 20th Century
or
Photograph -> Van Gogh

But it does not seem to be possible to ascertain whether or not Van Gogh took the photo, or if he is in the photo, or if the photo is of one of his paintings. However, we can determine that Van Gogh is a person and a painter.

Photograph -> Van Gogh -> (is a) painter -> (is a) person

These "is a" relationships can be dervied from the database as it is, thanks to it's hierarchical dictionaries.

However, I think we want to be able to say:

Photograph -> (of) George Bush -> (is a) person
Photograph -> (Taken by) John Smith -> (is a) person

Which requires extensions to the structure.

I'm working on a few ideas as to how this might be laid out, but if anyone has any bright ideas jot them down.

An RDF styled structure certainly has benefits in this field - it's perfect for object-property-value type relations, but I think it would be rather slow to query.

What about Topic Maps? A lot of material, articles, tools and paper can be found here [1]. It seems to be quite close to our needs.
Don't know about performances!

Language considerations[edit]

Deagostini currently stores their keywords in a fashion that allows them to store each in several languages.

The structure looks a little like this:
KEYWORD (Keyword_id, Language_id, Description)
LANGUAGE (Language_id, Description)

So we may see
Keywords:
(0, 0, Socks)
(0, 1, Socken)
(0, 2, Chaussettes)

Languages:
(0, English)
(1, German)
(2, French)

With this structure, an item can be associated with the keyword ID 0 and thus made obtainable using a variety of language searches.

External Links[edit]

OntoMedia - An ontology for describing media which could be used in the metadata engine. It contains a lot of vocab for describing stories and events, which may be a bit unnecessary for eChase's needs. However, the ability to relate media objects is desirable.

MySQL Database Schema Script[edit]

SET FOREIGN_KEY_CHECKS=0;

create database if not exists `echase_metadata_schema`;

USE `echase_metadata_schema`;

/*Table structure for table `echase_metadata_schema`.`actor` */

drop table if exists `echase_metadata_schema`.`actor`;

CREATE TABLE `actor` (
  `id` int(11) NOT NULL default '0',
  `name` varchar(255) NOT NULL default '',
  PRIMARY KEY  (`id`),
  FULLTEXT KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC COMMENT='Actor table represents a person or institution';

/*Table structure for table `echase_metadata_schema`.`attribution` */

drop table if exists `echase_metadata_schema`.`attribution`;

CREATE TABLE `attribution` (
  `id` int(11) NOT NULL default '0',
  `information_carrier_id` int(11) NOT NULL default '0',
  `actor_id` int(11) default NULL,
  `place_id` int(11) default NULL,
  `date_id` int(11) default NULL,
  PRIMARY KEY  (`id`),
  KEY `information_carrier_id` (`information_carrier_id`),
  KEY `place_id` (`place_id`),
  KEY `date_id` (`date_id`),
  KEY `actor_id` (`actor_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC COMMENT='Attribution table representing attribution events';

/*Table structure for table `echase_metadata_schema`.`information_carrier` */

drop table if exists `echase_metadata_schema`.`information_carrier`;

CREATE TABLE `information_carrier` (
  `id` int(11) NOT NULL default '0',
  `parent_id` int(11) default '0',
  `collection` int(11) NOT NULL default '0',
  `title` varchar(255) default NULL,
  `description` varchar(255) default NULL,
  PRIMARY KEY  (`id`),
  KEY `parent_id` (`parent_id`),
  KEY `collection` (`collection`),
  KEY `title` (`title`),
  KEY `description` (`description`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC COMMENT='Information carrier table representing the main object';

/*Table structure for table `echase_metadata_schema`.`thesaurus` */

drop table if exists `echase_metadata_schema`.`thesaurus`;

CREATE TABLE `thesaurus` (
  `id` int(11) NOT NULL default '0',
  `lang` varchar(5) NOT NULL default '',
  `name` varchar(50) NOT NULL default '',
  `hierarchical` tinyint(1) NOT NULL default '0',
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC COMMENT='Table describing the available thesauri';

/*Table structure for table `echase_metadata_schema`.`thesaurus_hierarchy` */

drop table if exists `echase_metadata_schema`.`thesaurus_hierarchy`;

CREATE TABLE `thesaurus_hierarchy` (
  `id` int(11) NOT NULL default '0',
  `thesaurus_id` int(11) NOT NULL default '0',
  `hierarchy` varchar(255) NOT NULL default '',
  PRIMARY KEY  (`id`,`thesaurus_id`),
  FULLTEXT KEY `hierarchy` (`hierarchy`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC COMMENT='Thesaurus_hierarchy table representing thesauri hierarchies';

/*Table structure for table `echase_metadata_schema`.`thesaurus_item` */

drop table if exists `echase_metadata_schema`.`thesaurus_item`;

CREATE TABLE `thesaurus_item` (
  `id` int(11) NOT NULL default '0',
  `thesaurus_id` int(11) NOT NULL default '0',
  `lang` varchar(5) NOT NULL default '',
  `name` varchar(50) NOT NULL default '',
  `preffered_name` tinyint(1) NOT NULL default '0',
  PRIMARY KEY  (`id`,`thesaurus_id`,`lang`,`name`),
  FULLTEXT KEY `lang` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC COMMENT='Thesaurus_item table holds language translations of thesauri';

/*Table structure for table `echase_metadata_schema`.`time_span` */

drop table if exists `echase_metadata_schema`.`time_span`;

CREATE TABLE `time_span` (
  `id` int(11) NOT NULL default '0',
  `from_date` bigint(20) default NULL,
  `from_time` time default NULL,
  `from_ca` tinyint(1) NOT NULL default '0',
  `to_date` bigint(20) default NULL,
  `to_time` time default NULL,
  `to_ca` tinyint(1) NOT NULL default '0',
  PRIMARY KEY  (`id`),
  KEY `from` (`from_date`),
  KEY `to` (`to_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC COMMENT='Time_span table represents a date range';

SET FOREIGN_KEY_CHECKS=1;
 

Database schema description[edit]

information_carrier The root table in the schema is information_carrier. This table contains:

   id - the object identifier
   parent_id - the id of another object (to allow hierarchies of objects)
   collection - the identifier of the collection the object belongs to
   title - the objects title
   description - the objects description

attribution This table models a creation event to model where, when, how, and who created the object. For example, a chair was made on 5th January 2005 by Joe Bloggs in Southampton. This table contains:

   id - the attribution's identifier
   information_carrier_id - the object id referencing the information carrier the object belongs to
   actor_id - the id of a person or institute involved in the attribution event, this refers to the actor table
   place_id - the id of a place/location involved in the attribution event. This references a thesauri id in the thesaurus_item and thesaurus_hierarchy tables
   date_id - the id of a period that the event occured at, this refers to the time_span table

actor This table represents a person or institute. At the present time this table is simple, and will probably be extended. The table contains:

   id - the actor's identifier
   name - the name of the person/institute 

Language code format: ISO-639 two-letter with ISO-3166 two-letter country code. e.g. en-GB, en-US, fr-CA, or as a fall back ISO-639 two-letter code e.g. en, fr, it to be more general.

thesaurus This table provides a list of available thesauri, and information about the thesauri. This table contains:

   id - thesaurus identifier
   lang - the language of the thesaurus. 
   name - the thesaurus name in the specified language
   hierarchical - boolean, true if the thesaurus is hierarchical, otherwise false

thesaurus_hierarchy This table contains the list of thesauri item ids and hierarcy representations for searching. The hierarcy column contains values of the form <thesaurus_hierarcy_id>.<thesaurus_hierarcy_id>.<thesaurus_hierarcy_id> to represent the hierarchy. For example three items exist in the place thesaurus, England, Hampshire, and Southampton with ids 1, 2, and 3 respectively. Their hierarchical representation is 1.2.3 which denotes England -> Hampshire -> Southampton.

   id - thesaurus item identifier
   thesaurus_id - the identifier of the thesaurus the item belongs to
   hierarchy - hierarchical representation of the item, example provided above.

thesaurus_item This table provides a list of available thesauri, and information about the thesauri. This table contains:

   id - thesaurus item identifier
   thesaurus_id - thesaurus identifier (matches the id column in table thesaurus)
   lang - the language of the thesaurus item. 
   name - the item name in the specified language
   preffered_name - boolean, true if the item is the preferred value to be used in the specified language, otherwise false

time_span This table represents a time period. This table contains:

   id - time_span identifier
   from_date - from/start date
   from_time - from/start time
   from_ca - boolean, true if the dates are only approximate, circa, guesses.
   to_date - to/end date
   to_time - to/end time
   to_ca - boolean, true if the dates are only approximate, circa, guesses.

The date format (from_date, to_date) is a signed bigint (MySql Numeric Types) of the format YYYYYYYYYYYYYYYMMDD , where the year component can be 15 digits long; the month and day component (MMDD) has 4 digits. The time format (from_time, to_time) is a time type (The TIME Type) of the format HHMMSS. BC dates are represented as negative dates. Some example dates:

YYYYYYYYYYYYYYYMMDD
           20050101 = 1st January 2005
        -1000000000 = 100,000 BC
    -45500000000000 = 4.55 billion (<math>4.55 * 10^9</math> Billion) years BC (the age of the Earth)
   -137000000000000 = 13.7 billion years BC (the age of the Universe)

The time example:

             HHMMSS
             173530 = 17:35:30

Search date range in mysql: This will search for items created between 1940 and 1950.

SELECT id, year(from_date) 
FROM time_span 
WHERE year(from_date) BETWEEN year(19400000) AND year(19500000) 
ORDER BY year(from_date)