SRW Mediaengine integration changes

From EChase
Jump to: navigation, search

Heres a list of classes which have been changed and how they have been changed to make the SRW use the mediaengine on a content based query.

ISearchRetrieveQueryContext, SearchRetrieveQueryContext and ContentQuery[edit]

The context and the context interface has been changed to accomodate the get/set methods for ContentQuery. Content query simply contains the information required to do a content based query. This is the algorithm and the ID for the mediaitem being compared to. This mediaitem ID and algorithm should be provided by anything calling the SRW meaning that the webapp needs to contact the mediaengine independatly to obtain IDs for the StoredItem to be compared against. This id can be that of a temporarily uploaded media item or that of a collection item.

CQLTraverser[edit]

This class handles the initial traversal of a CQL argument. Here the SRW decides whether something should be in the final generated SQL statment. Of course the content part of the query should not be in the SQL statment which extracts metadata so we had to remove the content part from the CQL query. Currently we hacked this by ignoring a CQL "AND" & "OR" statment if either of its children were "similarTo" (the content based argument). Clearly this isnt a very clean way to do it and this entire process should be refactored in future incarnations of the SRW

GenericSchemaCQLProcessor[edit]

The processID method has been changed to stored content based query components on traversal. It fills a ContentQuery object and stores said object into the SearchRetreiveQueryContext. I'm not sure whether this should be done in a better way? Have a look. Either way this is how CBR query info is extracted from the CQL.

GenericQueryBuildingController[edit]

the processContentQuery() has been changed the most here. It does lots of lovely things which i will go through...now. Firstly it creates a different kind of MySQLQuery. This is a MixedContentQuery. This is a type of mysql query which during its getResultSet() goes off and contacts the Mediaengine, more on that later. Next difference from a normal non content query is the structure of the tempresultset table. It now contains a legacy_id field which is used by MixedContentQuery to get the legacy_id of each item to be searched and thus perform the mediaengine search. The addition of the legacy_id field was done with the use of the ValueColoumn class, the getColumnNameInQuery in GenericQueryBuildingController was modified to understand what a ValueColoumn was.

MixedContentQuery[edit]

A extention of MySQLQuery. Firstly it performs the normal non content query. From this it retreives a tempreultset table which contains all the ids and legacy_ids required. These are then formated and passed to the mediaengine. The result from the mediaengine is then parsed and the distance is used to update the results table. A new select statment is then made which orders the results by the distance, this is then used by the rest of the SRW as normal

TO BE DONE[edit]

  • HArdcoded URI for mediaengine webservice
  • CQL extraction, fix for CBR query on its own
  • not deleting the temporary tables