Items by He, Xiaofei
Number of items: 4. Wu, Hao and Qiu, Guang and He, Xiaofei and Shi, Yuan and Qu, Mingcheng and Shen, Jing and Bu, Jiajun and Chen, Chun Advertising Keyword Generation Using Active Learning. This paper proposes an efficient relevance feedback based interactive model for keyword generation in sponsored search advertising. We formulate the ranking of relevant terms as a supervised learning problem and suggest new terms for the seed by leveraging user relevance feedback information. Active learning is employed to select the most informative samples from a set of candidate terms for user labeling. Experiments show our approach improves the relevance of generated terms significantly with little user effort required.
Wang, Junfeng and He, Xiaofei and Wang, Can and Pei, Jian and Bu, Jiajun and Chen, Chun and Guan, Ziyu and Gang, Lu News Article Extraction with Template-Independent Wrapper. We consider the problem of template-independent news extraction. The state-of-the-art news extraction method is based on template-level wrapper induction, which has two serious limitations. 1) It cannot correctly extract pages belonging to an unseen template until the wrapper for that template has been generated. 2) It is costly to maintain up-to-date wrappers for hundreds of websites, because any change of a template may lead to the invalidation of the corresponding wrapper. In this paper we formalize news extraction as a machine learning problem and learn a template-independent wrapper using a very small number of labeled news pages from a single site. Novel features dedicated to news titles and bodies are developed respectively. Correlations between the news title and the news body are exploited. Our template-independent wrapper can extract news pages from different sites regardless of templates. In experiments, a wrapper is learned from 40 pages from a single news site. It achieved 98.1% accuracy over 3,973 news pages from 12 news sites.
Qu, Mingcheng and Qiu, Guang and He, Xiaofei and Zhang, Cheng and Wu, Hao and Bu, Jiajun and Chen, Chun Probabilistic Question Recommendation for Question Answering Communities. User-Interactive Question Answering (QA) communities such as Yahoo! Answers are growing in popularity. However, as these QA sites always have thousands of new questions posted daily, it is difficult for users to find the questions that are of interest to them. Consequently, this may delay the answering of the new questions. This gives rise to question recommendation techniques that help users locate interesting questions. In this paper, we adopt the Probabilistic Latent Semantic Analysis (PLSA) model for question recommendation and propose a novel metric to evaluate the performance of our approach. The experimental results show our recommendation approach is effective.
Zhu, Junyan and Wang, Can and He, Xiaofei and Bu, Jiajun and Chen, Chun and Shang, Shujie and Qu, Mingcheng and Lu, Gang Tag-Oriented Document Summarization. Social annotations on a Web document are highly generalized description of topics contained in that page. Their tagged frequency indicates the user attentions with various degrees. This makes annotations a good resource for summarizing multiple topics in a Web page. In this paper, we present a tag-oriented Web document summarization approach by using both document content and the tags annotated on that document. To improve summarization performance, a new tag ranking algorithm named EigenTag is proposed in this paper to reduce noise in tags. Meanwhile, association mining technique is employed to expand tag set to tackle the sparsity problem. Experimental results show our tag-oriented summarization has a significant improvement over those not using tags.
This list was generated on Fri Feb 15 08:51:33 2019 GMT.
About this site
This website has been set up for WWW2009 by Christopher Gutteridge of the University of Southampton, using our EPrints software.
Preservation
We (Southampton EPrints Project) intend to preserve the files and HTML pages of this site for many years, however we will turn it into flat files for long term preservation. This means that at some point in the months after the conference the search, metadata-export, JSON interface, OAI etc. will be disabled as we "fossilize" the site. Please plan accordingly. Feel free to ask nicely for us to keep the dynamic site online longer if there's a rally good (or cool) use for it... [this has now happened, this site is now static]
|