Coherent Keyphrase Extraction via Web Mining

Turney, Peter (2003) Coherent Keyphrase Extraction via Web Mining. [Conference Paper]

Full text available as:



Keyphrases are useful for a variety of purposes, including summarizing, indexing, labeling, categorizing, clustering, highlighting, browsing, and searching. The task of automatic keyphrase extraction is to select keyphrases from within the text of a given document. Automatic keyphrase extraction makes it feasible to generate keyphrases for the huge number of documents that do not have manually assigned keyphrases. A limitation of previous keyphrase extraction algorithms is that the selected keyphrases are occasionally incoherent. That is, the majority of the output keyphrases may fit together well, but there may be a minority that appear to be outliers, with no clear semantic relation to the majority or to each other. This paper presents enhancements to the Kea keyphrase extraction algorithm that are designed to increase the coherence of the extracted keyphrases. The approach is to use the degree of statistical association among candidate keyphrases as evidence that they may be semantically related. The statistical association is measured using web mining. Experiments demonstrate that the enhancements improve the quality of the extracted keyphrases. Furthermore, the enhancements are not domain-specific: the algorithm generalizes well when it is trained on one domain (computer science documents) and tested on another (physics documents).

Item Type:Conference Paper
Subjects:Computer Science > Statistical Models
Computer Science > Language
Computer Science > Machine Learning
ID Code:3122
Deposited By: Turney, Peter
Deposited On:27 Aug 2003
Last Modified:11 Mar 2011 08:55

References in Article

Select the SEEK icon to attempt to find the referenced article. If it does not appear to be in cogprints you will be forwarded to the paracite service. Poorly formated references will probably not work.

[Barzilay and Elhadad, 1997] Barzilay, R., and Elhadad, M. Using lexical chains for text summarization. In Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarization, 10-17, 1997.

[Church and Hanks, 1989] Church, K.W., and Hanks, P. Word association norms, mutual information and lexicography. Proceedings of the 27th Annual Conference of the Association of Computational Linguistics, pp. 76-83, 1989.

[Church et al., 1991] Church, K.W., Gale, W., Hanks, P., and Hindle, D. Using statistics in lexical analysis. In Uri Zernik (ed.), Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, pp. 115-164. New Jersey: Lawrence Erlbaum, 1991.

[Domingos and Pazzani, 1997] Domingos, P., and Pazzani, M. On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103-130, 1997.

[Dumais et al., 1998] Dumais, S., Platt, J., Heckerman, D. and Sahami, M. Inductive learning algorithms and representations for text categorization. Proceedings of the Seventh International Conference on Information and Knowledge Management, 148-155. ACM, 1998.

[Fayyad and Irani, 1993] Fayyad, U.M., and Irani, K.B. Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of 13th International Joint Conference on Artificial Intelligence (IJCAI-93), pp. 1022-1027, 1993.

[Feelders and Verkooijen, 1995] Feelders, A., and Verkooijen, W. Which method learns the most from data? Methodological issues in the analysis of comparative studies. Fifth International Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, Florida, pp. 219-225, 1995.

[Frank et al., 1999] Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., and Nevill-Manning, C.G. Domain-specific keyphrase extraction. Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99), pp. 668-673. California: Morgan Kaufmann, 1999.

[Gutwin et al., 1999] Gutwin, C., Paynter, G.W., Witten, I.H., Nevill-Manning, C.G., and Frank, E. Improving browsing in digital libraries with keyphrase indexes. Journal of Decision Support Systems, 27, 81-104, 1999.

[Halliday and Hasan, 1976] Halliday, M.A.K., and Hasan, R. Cohesion in English. London: Longman, 1976.

[Jones and Paynter, 2001] Jones, S. and Paynter, G.W. Human evaluation of Kea, an automatic keyphrasing system. First ACM/IEEE-CS Joint Conference on Digital Libraries, Roanoke, Virginia, June 24-29, 2001, ACM Press, 148-156.

[Jones and Paynter, 2002] Jones, S. and Paynter, G.W. Automatic extraction of document keyphrases for use in digital libraries: Evaluation and applications. Journal of the American Society for Information Science and Technology (JASIST), 53 (8), 653-677, 2002.

[Leung and Kan, 1997] Leung, C.-H., and Kan, W.-K. A statistical learning approach to automatic indexing of controlled index terms. Journal of the American Society for Information Science, 48, 55-66, 1997.

[Manning and Schütze, 1999] Manning, C.D., and Schütze, H. Foundations of Statistical Natural Language Processing. Cambridge, Massachusetts: MIT Press, 1999.

[Morris and Hirst, 1991] Morris, J., and Hirst, G. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17(1), 21-48, 1991.

[Turney, 1999] Turney, P.D. Learning to Extract Keyphrases from Text. National Research Council, Institute for Information Technology, Technical Report ERB-1057, 1999.

[Turney, 2000] Turney, P.D. Learning algorithms for keyphrase extraction. Information Retrieval, 2, 303-336, 2000.

[Turney, 2001] Turney, P.D. Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001), Freiburg, Germany, pp. 491-502, 2001.

[van Rijsbergen, 1979] van Rijsbergen, C.J. Information Retrieval. 2nd edition. London: Butterworths, 1979.

[Witten et al., 1999] Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C. and Nevill-Manning, C.G. KEA: Practical automatic keyphrase extraction. Proceedings of Digital Libraries 99 (DL'99), pp. 254-256. ACM Press, 1999.

[Witten et al., 2000] Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., and Nevill-Manning, C.G. KEA: Practical Automatic Keyphrase Extraction. Working Paper 00/5, Department of Computer Science, The University of Waikato, 2000.


Repository Staff Only: item control page