Cogprints

Measuring praise and criticism: Inference of semantic orientation from association

Turney, Peter and Littman, Michael (2003) Measuring praise and criticism: Inference of semantic orientation from association. [Journal (Paginated)]

Full text available as:

[img]
Preview
PDF
200Kb

Abstract

The evaluative character of a word is called its semantic orientation. Positive semantic orientation indicates praise (e.g., "honest", "intrepid") and negative semantic orientation indicates criticism (e.g., "disturbing", "superfluous"). Semantic orientation varies in both direction (positive or negative) and degree (mild to strong). An automated system for measuring semantic orientation would have application in text classification, text filtering, tracking opinions in online discussions, analysis of survey responses, and automated chat systems (chatbots). This paper introduces a method for inferring the semantic orientation of a word from its statistical association with a set of positive and negative paradigm words. Two instances of this approach are evaluated, based on two different statistical measures of word association: pointwise mutual information (PMI) and latent semantic analysis (LSA). The method is experimentally tested with 3,596 words (including adjectives, adverbs, nouns, and verbs) that have been manually labeled positive (1,614 words) and negative (1,982 words). The method attains an accuracy of 82.8% on the full test set, but the accuracy rises above 95% when the algorithm is allowed to abstain from classifying mild words.

Item Type:Journal (Paginated)
Keywords:semantic orientation, semantic association, web mining, text mining, text classification, unsupervised learning, mutual information, latent semantic analysis
Subjects:Computer Science > Statistical Models
Computer Science > Language
Linguistics > Computational Linguistics
Linguistics > Semantics
Computer Science > Machine Learning
ID Code:3164
Deposited By: Turney, Peter
Deposited On:19 Sep 2003
Last Modified:11 Mar 2011 08:55

References in Article

Select the SEEK icon to attempt to find the referenced article. If it does not appear to be in cogprints you will be forwarded to the paracite service. Poorly formated references will probably not work.

AGRESTI, A. 1996. An introduction to categorical data analysis. Wiley, New York.

BARTELL, B.T., COTTRELL, G.W., AND BELEW, R.K. 1992. Latent semantic indexing is an optimal special case of multidimensional scaling. Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 161-167.

BUDANITSKY, A. AND HIRST, G. 2001. Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures. Workshop on WordNet and Other Lexical Resources, Second meeting of the North American Chapter of the Association for Computational Linguistics, Pittsburgh, PA.

CHURCH, K.W., AND HANKS, P. 1989. Word association norms, mutual information and lexicography. Proceedings of the 27th Annual Conference of the Association of Computational Linguistics. Association for Computational Linguistics, New Brunswick, NJ, 76-83.

DEERWESTER, S., DUMAIS, S.T., FURNAS, G.W., LANDAUER, T.K., AND HARSHMAN, R. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391-407.

DUNNING, T. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19, 61-74.

FIRTH, J.R. 1957. A Synopsis of Linguistic Theory 1930-1955. In Studies in Linguistic Analysis, Philological Society, Oxford, 1-32. Reprinted in F.R. Palmer (ed.), Selected Papers of J.R. Firth 1952-1959, Longman, London, 1968.

GOLUB, G.H., AND VAN LOAN, C.F. 1996. Matrix Computations. Third edition. Johns Hopkins University Press, Baltimore, MD.

HATZIVASSILOGLOU, V., AND MCKEOWN, K.R. 1997. Predicting the semantic orientation of adjectives. Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and the 8th Conference of the European Chapter of the ACL. Association for Computational Linguistics, New Brunswick, NJ, 174-181.

HATZIVASSILOGLOU, V., AND WIEBE, J.M. 2000. Effects of adjective orientation and gradability on sentence subjectivity. Proceedings of 18th International Conference on Computational Linguistics. Association for Computational Linguistics, New Brunswick, NJ.

HEARST, M.A. 1992. Direction-based text interpretation as an information access refinement. In P. Jacobs (Ed.), Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval. Lawrence Erlbaum Associates, Mahwah, NJ.

KAMPS, J., AND MARX, M. 2002. Words with attitude. Proceedings of the First International Conference on Global WordNet, CIIL, Mysore, India, 332-341.

LANDAUER, T.K., AND DUMAIS, S.T. 1997. A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211-240.

LANDAUER, T.K. 2002. On the computational basis of learning and cognition: Arguments from LSA. To appear in B.H. Ross (Ed.), The Psychology of Learning and Motivation.

LITTMAN, M.L. 2001. Language games and other meaningful pursuits. Presentation slides. (http://www.cs.rutgers.edu/~mlittman/talks/CA-lang.ppt).

MANNING, C.D., AND SCHÜTZE, H. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.

MILLER, G.A. 1990. WordNet: An on-line lexical database. International Journal of Lexicography, 3(4), 235-312.

OSGOOD, C.E., SUCI, G.J., AND TANNENBAUM, P.H. 1957. The Measurement of Meaning. University of Illinois Press, Chicago.

PANG, B., LEE, L., AND VAITHYANATHAN, S. 2002. Thumbs up? Sentiment classification using machine learning techniques. Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, 79-86.

RESNIK, P. 1995. Using information content to evaluate semantic similarity in a taxonomy. Proceedings of the 14th International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Mateo, CA, 448-453.

SCHÜTZE, H. 1993. Word space. In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5. Morgan Kaufmann., San Mateo, CA, 895-902.

SMADJA, F. 1993. Retrieving collocations from Text: Xtract. Computational Linguistics, 19, 143-177.

SPERTUS, E. 1997. Smokey: Automatic recognition of hostile messages. Proceedings of the Conference on Innovative Applications of Artificial Intelligence. AAAI Press, Menlo Park, CA, 1058-1065.

STONE, P. J., DUNPHY, D. C., SMITH, M. S., AND OGILVIE, D. M. 1966. The General Inquirer: A Computer Approach to Content Analysis. MIT Press, Cambridge, MA.

TONG, R.M. 2001. An operational system for detecting and tracking opinions in on-line discussions. Working Notes of the ACM SIGIR 2001 Workshop on Operational Text Classification. ACM, New York, NY, 1-6.

TURNEY, P.D. 2001. Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning. Springer-Verlag, Berlin, 491-502.

TURNEY, P.D. 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. Proceedings of the Association for Computational Linguistics 40th Anniversary Meeting. Association for Computational Linguistics, New Brunswick, NJ.

VAN RIJSBERGEN, C.J. 1979. Information Retrieval (2nd edition), Butterworths, London.

WIEBE, J.M. 2000. Learning subjective adjectives from corpora. Proceedings of the 17th National Conference on Artificial Intelligence. AAAI Press, Menlo Park, CA.

WIEBE, J.M., BRUCE, R., BELL, M., MARTIN, M., & WILSON, T. 2001. A corpus study of evaluative and speculative language. Proceedings of the Second ACL SIG on Dialogue Workshop on Discourse and Dialogue. Aalborg, Denmark.

Metadata

Repository Staff Only: item control page