Watermarked Media Issue
Problem[edit]
The alinari guys have recently given us a load of images all of which had a big alinari watermark right in the center of the image. This, along with other watermarks, are clearly a problem from content based image retreival. Many of region based algorithms may fall over and find false positives and incorrect similarities simply becuase of the watermark, rather than the features in the image which matter. Here i hope to discuss possible solutions to this watermark issue.
Solutions[edit]
Desigining our media engine so it can handle watermarked content is clearly bonus and definetly something we should consider. The primary step of finding a solution to this problem is to find out why these guys want the watermark there in the first place. Once we get an idea of why its there we can discuss possible ways of solving the problem
We can assume the watermark is there so that any images which are shown publically using our system can be directly attributed to the institution which generated them. There are several possible solutions which me and pattrick discussed:
2 Image sets: One solution could be for the institutions to provide us with 2 image sets, one containing all images without watermark and the other containing images with watermark. Our system could then make a distinction between images which were visible and images which were used for content based analysis directly. This solution is good but it does involve us holding double the data and the user providing double the data. problems possibly?
Media Descriptor Generation off-site: We could support the generation of the media descriptors (i.e. feature vectors) at the content partner's site. The content partner would provide watermarked media and run a media engine service at their site which would generate the media descriptors from the original media. The feature vectors could then be upload to the main eCHASE server. In this way, we use only one watermarked image set for display and content based retrieval is run on feature vectors generated from the un-watermarked media which is kept at the content provider's site. However, there may be many reasons to regenerate media descriptors (new algorithms, updates to existing algorithms) and this will require input from someone at the content provider to regenerate the media descriptors.
Automatic watermark: If the user simply wants displayed images to be watermarked they could just provide us with the watermark they want to add and a clean set of images, and we could add the watermark ourselves on display. This would give us clean to analyse and the system output would be attributed to the end user. We can either implement such a system ourselves or investigate watermarking systems such as XLImage.
Automatic watermark removal: On addition of sets of images to the system, the end user could specificy that certain images had a specific watermark added to them. We could employ some sort of algorithm to automatically remove watermarks from images pre feature extraction. This could be a very dirty and error prone process and may alter the content of the images.
Any way we choose, making our system tollerant of watermark and other copyright needs is a good things, lets make it go