Items from Rich Media track
Number of items: 6.
and Montagnuolo, Maurizio A Generalised Cross-Modal Clustering Method Applied to Multimedia News Semantic Indexing and Retrieval.
Current Web technology has enabled the distribution of informative content through dynamic media platforms. In addition, the availability of the same content in the form of digital multimedia data has dramatically increased. Contentbased, cross-media retrieval applications are needed to efficiently access desired information from this variety of data sources. This paper presents a novel approach for crossmedia information aggregation, and describes a prototype system implementing this approach. The prototype adopts online newspaper articles and TV newscasts as information sources, to deliver a service made up of items including both contributions. Extensive experiments prove the effectiveness of the proposed approach in a real-world business context.
and Yang, Linjun
and Yu, Nenghai
and Hua, Xian-Sheng Learning to Tag.
Social tagging provides valuable and crucial information for large-scale web image retrieval. It is ontology-free and easy to obtain; however, irrelevant tags frequently appear, and users typically will not tag all semantic objects in the image, which is also called semantic loss. To avoid noises and compensate for the semantic loss, tag recommendation is proposed in literature. However, current recommendation simply ranks the related tags based on the single modality of tag co-occurrence on the whole dataset, which ignores other modalities, such as visual correlation. This paper proposes a multi-modality recommendation based on both tag and visual correlation, and formulates the tag recommendation as a learning problem. Each modality is used to generate a ranking feature, and Rankboost algorithm is applied to learn an optimal combination of these ranking features from different modalities. Experiments on Flickr data demonstrate the effectiveness of this learning-based multi-modality recommendation strategy.
and Naaman, Mor Less Talk, More Rock: Automated Organization of Community-Contributed Collections of Concert Videos.
We describe a system for synchronization and organization of user-contributed content from live music events. We start with a set of short video clips taken at a single event by multiple contributors, who were using a varied set of capture devices. Using audio ﬁngerprints, we synchronize these clips such that overlapping clips can be displayed simultaneously. Furthermore, we use the timing and link structure generated by the synchronization algorithm to improve the ﬁndability and representation of the event content, including identifying key moments of interest and descriptive text for important captured segments of the show. We also identify the preferred audio track when multiple clips overlap. We thus create a much improved representation of the event that builds on the automatic content match. Our work demonstrates important principles in the use of content analysis techniques for social media content on the Web, and applies those principles in the domain of live music capture.
and Hua, Xian-Sheng
and Yang, Linjun
and Wang, Meng
and Zhang, Hong-Jiang Tag Ranking.
Social media sharing web sites like Flickr allow users to annotate images with free tags, which signiﬁcantly facilitate Web image search and organization. However, the tags associated with an image generally are in a random order without any importance or relevance information, which limits the effectiveness of these tags in search and other applications. In this paper, we propose a tag ranking scheme, aiming to automatically rank the tags associated with a given image according to their relevance to the image content. We ﬁrst estimate initial relevance scores for the tags based on probability density estimation, and then perform a random walk over a tag similarity graph to reﬁne the relevance scores. Experimental results on a 50, 000 Flickr photo collection show that the proposed tag ranking method is both effective and efficient. We also apply tag ranking into three applications: (1) tag-based image search, (2) tag recommendation, and (3) group recommendation, which demonstrates that the proposed tag ranking approach really boosts the performances of social-tagging related applications.
van Leuken, Reinier H.
and Garcia, Lluis
and Olivares, Ximena
and van Zwol, Roelof Visual Diversification of Image Search Results.
Due to the reliance on the textual information associated with an image, image search engines on the Web lack the discriminative power to deliver visually diverse search results. The textual descriptions are key to retrieve relevant results for a given user query, but at the same time provide little information about the rich image content. In this paper we investigate three methods for visual diversiﬁcation of image search results. The methods deploy lightweight clustering techniques in combination with a dynamic weighting function of the visual features, to best capture the discriminative aspects of the resulting set of images that is retrieved. A representative image is selected from each cluster, which together form a diverse result set. Based on a performance evaluation we ﬁnd that the outcome of the methods closely resembles human perception of diversity, which was established in an extensive clustering experiment carried out by human assessors. models deployed on the Web and by these photo sharing sites rely heavily on search paradigms developed within the ﬁeld Information Retrieval. This way, image retrieval can beneﬁt from years of research experience, and the better this textual metadata captures the content of the image, the better the retrieval performance will be. It is also commonly acknowledged that a picture has to be seen to fully understand its meaning, signiﬁcance, beauty, or context, simply because it conveys information that words can not capture, or at least not in any practical setting. This explains the large number of papers on content-based image retrieval (CBIR) that has been published since 1990, the breathtaking publication rates since 1997 , and the continuing interest in the ﬁeld . Moving on from simple low-level features to more discriminative descriptions, the ﬁeld has come a long way in narrowing down the semantic gap by using high-level semantics . Unfortunately, CBIR-methods using higher level semantics usually require extensive training, intricate object ontologies or expensive construction of a visual dictionary, and their performance remains unﬁt for use in large scale online applications such as the aforementioned search engines or websites. Consequently, retrieval models operating in the textual metadata domain are therefore deployed here. In these applications, image search results are usually displayed in a ranked list. This ranking reﬂects the similarity of the image’s metadata to the textual query, according to the textual retrieval model of choice. There may exist two problems with this ranking. First, it may be lacking visual diversity. For instance, when a speciﬁc type or brand of car is issued as query, it may very well be that the top of this ranking displays many times the same picture that was released by the marketing division of the company. Similarly, pictures of a popular holiday destination tend to show the same touristic hot spot, often taken from the same angle and distance. This absence of visual diversity is due to the nature of the image annotation, which does not allow or motivate people to adequately describe the visual content of an image. Second, the query may have several aspects to it that are not sufficiently covered by the ranking. Perhaps the user is interested in a particular aspect of the query, but doesn’t know how to express this explicitly and issues a broader, more general query. It could also be that a query yields so many different results, that it’s hard to get an overview of the collection of relevant images in the database.
De Choudhury, Munmun
and Sundaram, Hari
and John, Ajita
and Duncan Seligmann, Dorée What Makes Conversations Interesting? Themes, Participants and Consequences of Conversations in Online Social Media.
Rich media social networks promote not only creation and consumption of media, but also communication about the posted media item. What causes a conversation to be interesting, that prompts a user to participate in the discussion on a posted video? We conjecture that people participate in conversations when they find the conversation theme interesting, see comments by people whom they are familiar with, or observe an engaging dialogue between two or more people (absorbing back and forth exchange of comments). Importantly, a conversation that is interesting must be consequential – i.e. it must impact the social network itself. Our framework has three parts. First, we detect conversational themes using a mixture model approach. Second, we determine interestingness of participants and interestingness of conversations based on a random walk model. Third, we measure the consequence of a conversation by measuring how interestingness affects the following three variables – participation in related themes, participant cohesiveness and theme diffusion. We have conducted extensive experiments using a dataset from the popular video sharing site, YouTube. Our results show that our method of interestingness maximizes the mutual information, and is significantly better (twice as large) than three other baseline methods (number of comments, number of new participants and PageRank based assessment). create (e.g. upload photo on Flickr), and consume media (e.g. watch a video on YouTube). These websites also allow for significant communication between the users – such as comments by one user on a media uploaded by another. These comments reveal a rich dialogue structure (user A comments on the upload, user B comments on the upload, A comments in response to B’s comment, B responds to A’s comment etc.) between users, where the discussion is often about themes unrelated to the original video. Example of a conversation from YouTube  is shown in Figure 1. In this paper, the sequence of comments on a media object is referred to as a conversation. Note the theme of the conversation is latent and depends on the content of the conversation. The fundamental idea explored in this paper is that analysis of communication activity is crucial to understanding repeated visits to a rich media social networking site. People return to a video post that they have already seen and post further comments (say in YouTube) in response to the communication activity, rather than to watch the video again. Thus it is the content of the communication activity itself that the people want to read (or see, if the response to a video post is another video, as is possible in the case of YouTube). Furthermore, these rich media sites have notification mechanisms that alert users of new comments on a video post / image upload promoting this communication activity.
About this site
This website has been set up for WWW2009 by Christopher Gutteridge of the University of Southampton, using our EPrints software.
We (Southampton EPrints Project) intend to preserve the files and HTML pages of this site for many years, however we will turn it into flat files for long term preservation. This means that at some point in the months after the conference the search, metadata-export, JSON interface, OAI etc. will be disabled as we "fossilize" the site. Please plan accordingly. Feel free to ask nicely for us to keep the dynamic site online longer if there's a rally good (or cool) use for it... [this has now happened, this site is now static]