Items where author is affiliated with Yahoo! Research

Number of items: 14.

Ghosh, Arpita and Rubinstein, Benjamin I. P. and Vassilvitskii, Sergei and Zinkevich, Martin Adaptive Bidding for Display Advertising.

[Show abstact]

Chierichetti, Flavio and Kumar, Ravi and Raghavan, Prabhakar Compressed Web Indexes.

[Show abstact]

Codina, Joan and Kaltenbrunner, Andreas and Grivolla, Jens and Banchs, Rafael E. and Baeza-Yates, Ricardo Content Analysis in Web 2.0.

[Show abstact]

Web mining deals with understanding, and discovering information in, the World Wide Web. Web mining focuses on analyzing three different sources of information: web structure, user activity and the contents. When referring to the Web 2.0, web structure and user activity related data can be dealt with in a very similar way that in the case of the traditional Web, however, in the case of contents, conventional analysis and mining procedures are not suitable anymore. This is mainly because, in the Web 2.0, contents are generated by users, who make a very free use of language and are constantly incorporating new communication elements which are generally context dependent. This kind of language can also be found on chats, SMS, e-mails and other channels of informal textual communication. This workshop focuses on the problem of making Web 2.0 both searchable and analyzable in terms of its contents. This is an extremely important endeavor for current web mining technologies because of two reasons: first, user generated content (UGC) is growing faster than ever in the cyberspace and, two, automatic analysis of UGC will allow improving the user experience of common citizens about Internet resources and opportunities, while, simultaneously, detecting and tracking criminal and terrorist activity. In this first edition of the workshop we attempt to focus the attention of interested research groups and companies into the new challenges and opportunities related to Web 2.0 content analysis. More specifically, we will focus on specific tasks on the scope of text content mining, with the intention of extending the coverage to multimedia data in future editions of the workshop. According to this, for the first edition of the workshop, we will collect and provide a corpus which should be used as experimental collection to conduct research in three specific shared tasks: text normalization, opinion mining and misbehavior detection. In the text normalization shared task we want to address the problem related to chat-speak style of communication. Recently, some research has been carried out in this area for SMS communications and from the perspective of machine translation approaches. In this shared task we attempt to generalize the problem to Web 2.0 contents and to explore additional alternatives the participants can come out with. In the opinion mining shared task we want to address problems such as determining text subjectivity and polarity, and sentiment analysis. Although these problems have been already approached from different perspectives, most of the research has been carried out on specific domain data and applications where users are requested to rate services or products. Our intention is to focus the attention into the more general domain in which Web 2.0 users express their sentiments and opinions in their daily interaction within a virtual community. Finally, in the misbehavior detection shared task, we want to address the problems of detecting inappropriate activity in which some users in a virtual community can be molesting or offensive to some other members of the community. We consider that this shared task can provide a good starting point for a future shared task with the more ambitious goal of classifying users and detecting identity supplantation for on-line criminal activity.

Gan, Qingqing and Suel, Torsten Improved Techniques for Result Caching in Web Search Engines.

[Show abstact]

Yan, Hao and Ding, Shuai and Suel, Torsten Inverted Index Compression and Query Processing with Optimized Document Ordering.

[Show abstact]

Kennedy, Lyndon and Naaman, Mor Less Talk, More Rock: Automated Organization of Community-Contributed Collections of Concert Videos.

[Show abstact]

Baeza-Yates, Ricardo Mining the Web 2.0 for Better Search.

[Show abstact]

Pandey, Sandeep and Broder, Andrei and Chierichetti, Flavio and Josifovski, Vanja and Kumar, Ravi and Vassilvitskii, Sergei Nearest-Neighbor Caching for Content-Match Applications.

[Show abstact]

Broder, Andrei and Ciccolo, Peter and Gabrilovich, Evgeniy and Josifovski, Vanja and Metzler, Donald and Riedel, Lance and Yuan, Jeffrey Online Expansion of Rare Queries for Sponsored Search.

[Show abstact]

Chakrabarti, Deepayan and Kumar, Ravi and Punera, Kunal Quicklink Selection for Navigational Query Results.

[Show abstact]

Wang, Xuerui and Broder, Andrei and Fontoura, Marcus and Josifovski, Vanja A Search-based Method for Forecasting Ad Impression in Contextual Advertising.

[Show abstact]

Goel, Sharad and Muhamad, Roby and Watts, Duncan Social Search in "Small-World" Experiments.

[Show abstact]

Ding, Shuai and He, Jinru and Yan, Hao and Suel, Torsten Using Graphics Processors for High Performance IR Query Processing.

[Show abstact]

van Leuken, Reinier H. and Garcia, Lluis and Olivares, Ximena and van Zwol, Roelof Visual Diversification of Image Search Results.

[Show abstact]

Due to the reliance on the textual information associated with an image, image search engines on the Web lack the discriminative power to deliver visually diverse search results. The textual descriptions are key to retrieve relevant results for a given user query, but at the same time provide little information about the rich image content. In this paper we investigate three methods for visual diversiﬁcation of image search results. The methods deploy lightweight clustering techniques in combination with a dynamic weighting function of the visual features, to best capture the discriminative aspects of the resulting set of images that is retrieved. A representative image is selected from each cluster, which together form a diverse result set. Based on a performance evaluation we ﬁnd that the outcome of the methods closely resembles human perception of diversity, which was established in an extensive clustering experiment carried out by human assessors. models deployed on the Web and by these photo sharing sites rely heavily on search paradigms developed within the ﬁeld Information Retrieval. This way, image retrieval can beneﬁt from years of research experience, and the better this textual metadata captures the content of the image, the better the retrieval performance will be. It is also commonly acknowledged that a picture has to be seen to fully understand its meaning, signiﬁcance, beauty, or context, simply because it conveys information that words can not capture, or at least not in any practical setting. This explains the large number of papers on content-based image retrieval (CBIR) that has been published since 1990, the breathtaking publication rates since 1997 [12], and the continuing interest in the ﬁeld [4]. Moving on from simple low-level features to more discriminative descriptions, the ﬁeld has come a long way in narrowing down the semantic gap by using high-level semantics [8]. Unfortunately, CBIR-methods using higher level semantics usually require extensive training, intricate object ontologies or expensive construction of a visual dictionary, and their performance remains unﬁt for use in large scale online applications such as the aforementioned search engines or websites. Consequently, retrieval models operating in the textual metadata domain are therefore deployed here. In these applications, image search results are usually displayed in a ranked list. This ranking reﬂects the similarity of the image’s metadata to the textual query, according to the textual retrieval model of choice. There may exist two problems with this ranking. First, it may be lacking visual diversity. For instance, when a speciﬁc type or brand of car is issued as query, it may very well be that the top of this ranking displays many times the same picture that was released by the marketing division of the company. Similarly, pictures of a popular holiday destination tend to show the same touristic hot spot, often taken from the same angle and distance. This absence of visual diversity is due to the nature of the image annotation, which does not allow or motivate people to adequately describe the visual content of an image. Second, the query may have several aspects to it that are not sufficiently covered by the ranking. Perhaps the user is interested in a particular aspect of the query, but doesn’t know how to express this explicitly and issues a broader, more general query. It could also be that a query yields so many different results, that it’s hard to get an overview of the collection of relevant images in the database.

This list was generated on Fri Feb 15 08:58:36 2019 GMT.

About this site

This website has been set up for WWW2009 by Christopher Gutteridge of the University of Southampton, using our EPrints software.

Preservation

We (Southampton EPrints Project) intend to preserve the files and HTML pages of this site for many years, however we will turn it into flat files for long term preservation. This means that at some point in the months after the conference the search, metadata-export, JSON interface, OAI etc. will be disabled as we "fossilize" the site. Please plan accordingly. Feel free to ask nicely for us to keep the dynamic site online longer if there's a rally good (or cool) use for it... [this has now happened, this site is now static]