creators_name: Xie, Sihong
creators_name: Fan, Wei
creators_name: Peng, Jing
creators_name: Verscheure, Olivier
creators_name: Ren, Jiangtao
type: conference_item
datestamp: 2009-04-06 19:08:44
lastmod: 2009-04-23 23:59:15
metadata_visibility: show
title: Latent Space Domain Transfer between High Dimensional Overlapping Distributions
ispublished: pub
full_text_status: public
pres_type: paper
abstract: Transferring knowledge from one domain to another is challenging due to a number of reasons. Since both conditional and marginal distribution of the training data and test data are non-identical, model trained in one domain, when directly applied to a different domain, is usually low in accuracy. For many applications with large feature sets, such as text document, sequence data, medical data, image data of different resolutions, etc. two domains usually do not contain exactly the same features, thus introducing large numbers of “missing values” when considered over the union of features from both domains. In other words, its marginal distributions are at most overlapping. In the same time, these problems are usually high dimensional, such as, several thousands of features. Thus, the combination of high dimensionality and missing values make the relationship in conditional probabilities between two domains hard to measure and model. To address these challenges, we propose a framework that ﬁrst brings the marginal distributions of two domains closer by “ﬁlling up” those missing values of disjoint features. Afterwards, it looks for those comparable sub-structures in the “latent-space” as mapped from the expanded feature vector, where both marginal and conditional distribution are similar. With these sub-structures in latent space, the proposed approach then ﬁnd common concepts that are transferable across domains with high probability. During prediction, unlabeled instances are treated as “queries”, the mostly related labeled instances from outdomain are retrieved, and the classiﬁcation is made by weighted voting using retrieved out-domain examples. We formally show that importing feature values across domains and latentsemantic index can jointly make the distributions of two related domains easier to measure than in original feature space, the nearest neighbor method employed to retrieve related out domain examples is bounded in error when predicting in-domain examples. Software and datasets are available for download.
date: 2009-04
pagerange: 91-91
event_title: 18th International World Wide Web Conference
event_location: Madrid, Spain
event_dates: April 20th-24th, 2009
event_type: conference
refereed: TRUE
citation: Xie, Sihong <http://www2009.eprints.org/view/author/Xie=3ASihong=3A=3A.html> and Fan, Wei <http://www2009.eprints.org/view/author/Fan=3AWei=3A=3A.html> and Peng, Jing <http://www2009.eprints.org/view/author/Peng=3AJing=3A=3A.html> and Verscheure, Olivier <http://www2009.eprints.org/view/author/Verscheure=3AOlivier=3A=3A.html> and Ren, Jiangtao <http://www2009.eprints.org/view/author/Ren=3AJiangtao=3A=3A.html> (2009) Latent Space Domain Transfer between High Dimensional Overlapping Distributions. In: 18th International World Wide Web Conference, April 20th-24th, 2009, Madrid, Spain.
document_url: http://www2009.eprints.org/10/1/p91.pdf
document_url: http://www2009.eprints.org/10/2/www09LatentMapFinal5.ppt