Media Engine Documentation

From EChase
Jump to: navigation, search

A Haiku[edit]

Media engine
Provide me with some pictures
Fast as a panther

Building The Media Engine[edit]

In the modules directory, run

maven build-mediaengine

This will create a work directory and build the engine. Now Tomcat needs to be started so that the engine can be deployed.

(You may need to chmod 700 the files in the work/jakarta-tomcat-5.5.9/bin directory)

work/jakarta-tomcat-5.5.9/bin/startup-lib.sh

Now deploy the engine:

maven deploy-mediaengine

Now, check localhost:8080/axis. If all has gone according to plan, the Media services should appear in the list of services. Congratulations, you now have a working media engine. Take care of it.

Doing Things[edit]

Now that you have a media engine, you probably want to do something with it.

To add a collection, you can call scanCollection on the directory. Note that feature vectors are not generated at this point. This will pile the images into the media engine database for later retrieval. The directory should be local and could be available on a webserver.

You can then search your collections using the various search functions. If the media engine needs a feature vecotr it doesn't have, it will generate it. This means that the first query you do on a collection will take a long, long time. Why not get yourself a cup of coffee? If it times out, don't panic, the engine will still make all the feature vectors so it will never take that long again.

A standard approach you may make to a query would be:
Upload image through uploadItem, get stored item as result
Use stored item to query web service
Receive query ID
Retrieve results as desired from web service using getResults.


General mediaengine design[edit]

Here i'll go through a quick class by class guide of how the mediaengine works. Its all pretty simple and straightforward and has been built with flexibility and extentions in mind. Some bits are a tad broken but blaim time not me

The Interface[edit]

The webservice interface for the mediaengine is how everything is accessed. They are found in the class:

MediaEngineSOAPBindingImpl.java

They can all be run locally if you create an instance of this class and run the init() method.

The interface provides the following methods, here i'll give a quick overview of what they do:

uploadItem (byte[] arr)

upload an arbitrary media item into the mediaengine, return a StoredItem defining the items existance in the mediaengine (URL, file location, id etc.)

uploadItemUrl (String URL)

same but from a URL

scanCollection (String collection)

Give the name of a collection to traverse and store in the mediaengine. The collection is held in the directory defined by the variable medaiengine.collection.dir. The collection name maps directly to directory name. Scanning a collection adds its media to the database ready for anaysis. The media added is defined by the mime type regex in the variable mediaengine.mimetypes.supported, currently images/* and video/*

getAlgorithm (String name)

Get an Algorithm object by name. Used for all analysis in the mediaengine.

getAllAlgorithms ()

Grab a list of all the algorithms in the system. Yet to be implemented, a method to get algorithms according to a mime type? Would be useful and is supported by the algorithm class itself, meh.

analyseItemSet (StoredItem[], Algorithm)

Expressly analyse a set of stored items against an algorithm. The algorithm is one stored in the mediaengine and known about by the AlgorithmFactory. The StoredItems are those already in the mediaengine. In general this method is run once at the beggining at some time, its quite heavy weight with larger sets of StoredItem[] especially across webservice

compareItems (StoredItem, StoredItem, Algorithm)

Compare 2 stored items together with an algorithm. FeatureVectors generated if they dont already exist (and if the algorithm thinks they CAN exist (ie if the mime types match up))

getStoredItem(int id, boolean temp)

Grab a StoredItem based on internal ID and whether that item is a temporary (uploaded) one or not

getStoredItemByLegacy(String legacy_id)

Grab a StoredItem that is from a collection based on that items legacy_id....good for search against images already in the mediaegnine from a collection...shibby!

query(StoredItem, StoredItem[], Algorithm)

The various query methods. This is how querys happen in the system. All query methods return an int which is used to get the results from a query (there can obviously be many). Querys generally are given some notion of items to search against, an item to use as a subject and an algorithm to use. When a query is run, feature vectors are generated if they dont already exist (if they can be generated). This is in case the subject image (which might be an uploaded image) doesnt have the FV generated. This functionality can be used (but should NOT be used) to analyse the set of images being searched against. The ID returned is understood by getResults and getResultsCount. This particular incarnation takes a StoredItem, uses it to search against an Array of storedItems using an Algorithm.

queryCollection (StoredItem, String, Algorithm)

Used to search a specific collection (String) using a StoredItem with an Algorithm.

queryStringCollection (StoredItem, String[], Algorithm)

Instead of a StoredItem[] (which can be quite large) a string array of legacy_ids are passed and used. Handy cus the metadata and SRW can pass this

queryInternalStringCollection (String, String[], Algorithm)

Same as above, but the subject image is a String which can be either an uploaded item id (from the uploaded items StoredITem object) or it can be a legacy ID. mediaengine trys to figure out which it is

getResults (int,int start, int numberOf)

For a result set, get the next numberOf from the point start. ResultStoredItem set returned which contains information about distance as well as general StoredItem object info

getResultsCount (int)

returns how many results are in a resultset (good for traversal)


StoredItemFactory[edit]

Gets StoredItems from the database and registers new ones. Doesnt actually touch the database but talks with DBAccess. Also has the main collection traversal method called farmDirectory(). Farm directory is an iterative directory traverser that looks for media which matchs mime types understood by the system (config file). Other than this StoredItem factory is fairly dumb, simply returning items to the MediaEngineInterface

StoredItemAnalysis[edit]

The center of all analysis in the mediaengine. Here Feature vectors from stored items are compared using their appropriate algorithms. If feature vectors are not in the database, they are created and put in the database. Currently stored features are NEVER overwritten for a single storeditem and algorithm pair. This might be made into an option in future incarnations to allow changes in underlying algorithms? Comparison methods order passed sets of StoredItems and place them into results tables. All feature vector and algorithm interace occurs in this class.

Algorithms and AlgorithmFactory[edit]

Algorithms define things which can generate and compare feature vectors. Algorithms also know what kind of media (defined by mime type) they can accept. For a new algorithm to be added to the system an appropriate algorithm instance must be created. This instance can do what it wants, contact webservices, contact JNI or just have the algorithm itself, but it must exist for the algorithm to exist in the system. For an algorithm to exist in the eyes of the system at runtime, it must register itself with the AlgorithmFactory. In doing so the algorithm factory will add the algorithm to the database, register it and so on. How an algorithm does this might involve some changes to the system, perhaps some sort of init method in the main Mediaengine interface?

FeatureVector[edit]

A feature vector is a char[] create from an algorithm performed on a StoredItem. Note that ANY data can be stored in the feature vector, you just have to get that data in and out of a char[]. Everything else is taken care of. This allows ANY feature vector imaginable in my mind, as in, if your feature vector can exist in memory it can exist in this system :). I'd recommend making your own FV instantiation if you're gona do something crazy, but a concrete one is provided for ease. Feature vectors are generally registered to the system in StoredItemAnalysis when it is found out that a StoredItem needs to have its feature vector generated (this can happen at various points)


The end[edit]

Right, thats the mediaengine. It works with tests performed with the alinari collection of 5000 odd images and theres no reason it shouldnt work with everything else. Its fairly generic and robust, but if anything breaks dont hesitate to contact me