This item is a Paper in the Search track.
- Hu, Jian - Microsoft Research Asia
- Wang, Gang - Microsoft Research Asia
- Lochovsky, Fred - The Hong Kong University of Science and Technology
- Sun, Jian-tao - Microsoft Research Asia
- Chen, Zheng - Microsoft Research Asia
Published Version
| PDF (1529Kb) |
Abstract
Understanding the intent behind a user’s query can help search engine to automatically route the query to some corresponding vertical search engines to obtain particularly relevant contents, thus, greatly improving user satisfaction. There are three major challenges to the query intent classification problem: (1) Intent representation; (2) Domain coverage and (3) Semantic interpretation. Current approaches to predict the user’s intent mainly utilize machine learning techniques. However, it is difficult and often requires many human efforts to meet all these challenges by the statistical machine learning approaches. In this paper, we propose a general methodology to the problem of query intent classification. With very little human effort, our method can discover large quantities of intent concepts by leveraging Wikipedia, one of the best human knowledge base. The Wikipedia concepts are used as the intent representation space, thus, each intent domain is represented as a set of Wikipedia articles and categories. The intent of any input query is identified through mapping the query into the Wikipedia representation space. Compared with previous approaches, our proposed method can achieve much better coverage to classify queries in an intent domain even through the number of seed intent examples is very small. Moreover, the method is very general and can be easily applied to various intent domains. We demonstrate the effectiveness of this method in three different applications, i.e., travel, job, and person name. In each of the three cases, only a couple of seed intent queries are provided. We perform the quantitative evaluations in comparison with two baseline methods, and the experimental results show that our method significantly outperforms other approaches in each intent domain.
Export Record As...
- HTML Citation
- ASCII Citation
- Resource Map
- OpenURL ContextObject
- EndNote
- BibTeX
- OpenURL ContextObject in Span
- MODS
- DIDL
- EP3 XML
- JSON
- Dublin Core
- Reference Manager
- Eprints Application Profile
- Simple Metadata
- Refer
- METS