Items from Data Mining track

Transferring knowledge from one domain to another is challenging due to a number of reasons. Since both conditional and marginal distribution of the training data and test data are non-identical, model trained in one domain, when directly applied to a different domain, is usually low in accuracy. For many applications with large feature sets, such as text document, sequence data, medical data, image data of different resolutions, etc. two domains usually do not contain exactly the same features, thus introducing large numbers of “missing values” when considered over the union of features from both domains. In other words, its marginal distributions are at most overlapping. In the same time, these problems are usually high dimensional, such as, several thousands of features. Thus, the combination of high dimensionality and missing values make the relationship in conditional probabilities between two domains hard to measure and model. To address these challenges, we propose a framework that ﬁrst brings the marginal distributions of two domains closer by “ﬁlling up” those missing values of disjoint features. Afterwards, it looks for those comparable sub-structures in the “latent-space” as mapped from the expanded feature vector, where both marginal and conditional distribution are similar. With these sub-structures in latent space, the proposed approach then ﬁnd common concepts that are transferable across domains with high probability. During prediction, unlabeled instances are treated as “queries”, the mostly related labeled instances from outdomain are retrieved, and the classiﬁcation is made by weighted voting using retrieved out-domain examples. We formally show that importing feature values across domains and latentsemantic index can jointly make the distributions of two related domains easier to measure than in original feature space, the nearest neighbor method employed to retrieve related out domain examples is bounded in error when predicting in-domain examples. Software and datasets are available for download.

Bennett, Paul N. and Maxwell Chickering, David and Mityagin, Anton Learning Consensus Opinion: Mining Data from a Labeling Game.

[Show abstact]

Bian, Jiang and Liu, Yandong and Zhou, Ding and Agichtein, Eugene and Zha, Hongyuan Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement.

[Show abstact]

Stern, David and Herbrich, Ralf and Graepel, Thore Matchbox: Large Scale Online Bayesian Recommendations.

[Show abstact]

Lu, Yue and Zhai, ChengXiang and Sundaresan, Neel Rated Aspect Summarization of Short Comments.

[Show abstact]

Korolova, Aleksandra and Kenthapadi, Krishnaram and Mishra, Nina and Ntoulas, Alexandros Releasing Search Queries and Clicks Privately.

[Show abstact]

Ali Bayir, Murat and Hakki Toroslu, Ismail and Cosar, Ahmet and Fidan, Guven Smart Miner: A New Framework for Mining Large Scale Web Usage Data.

[Show abstact]

Agarwal, Deepak and Chen, Bee-Chung and Elango, Pradheep Spatio-Temporal Models for Estimating Click-through Rate.

[Show abstact]

Zhu, Jun and Nie, Zaiqing and Liu, Xiaojiang and Zhang, Bo and Wen, Ji-Rong StatSnowball: a Statistical Approach to Extracting Entity Relationships.

[Show abstact]

Cao, Huanhuan and Jiang, Daxin and Pei, Jian and Chen, Enhong and Li, Hang Towards Context-Aware Search by Learning a Very Large Variable Length Hidden Markov Model from Search Logs.

[Show abstact]

This list was generated on Fri Feb 15 08:35:59 2019 GMT.

About this site

This website has been set up for WWW2009 by Christopher Gutteridge of the University of Southampton, using our EPrints software.

Preservation

We (Southampton EPrints Project) intend to preserve the files and HTML pages of this site for many years, however we will turn it into flat files for long term preservation. This means that at some point in the months after the conference the search, metadata-export, JSON interface, OAI etc. will be disabled as we "fossilize" the site. Please plan accordingly. Feel free to ask nicely for us to keep the dynamic site online longer if there's a rally good (or cool) use for it... [this has now happened, this site is now static]