Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews

Turney, Peter D. (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. [Conference Paper]

Full text available as:

[img] PDF


This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs. A phrase has a positive semantic orientation when it has good associations (e.g., "subtle nuances") and a negative semantic orientation when it has bad associations (e.g., "very cavalier"). In this paper, the semantic orientation of a phrase is calculated as the mutual information between the given phrase and the word "excellent" minus the mutual information between the given phrase and the word "poor". A review is classified as recommended if the average semantic orientation of its phrases is positive. The algorithm achieves an average accuracy of 74% when evaluated on 410 reviews from Epinions, sampled from four different domains (reviews of automobiles, banks, movies, and travel destinations). The accuracy ranges from 84% for automobile reviews to 66% for movie reviews.

Item Type:Conference Paper
Subjects:Computer Science > Artificial Intelligence
Computer Science > Language
Computer Science > Machine Learning
Computer Science > Statistical Models
ID Code:2321
Deposited By: Turney, Peter
Deposited On:15 Jul 2002
Last Modified:11 Mar 2011 08:54

References in Article

Select the SEEK icon to attempt to find the referenced article. If it does not appear to be in cogprints you will be forwarded to the paracite service. Poorly formated references will probably not work.

Agresti, A. 1996. An introduction to categorical data analysis. New York: Wiley.

Brill, E. 1994. Some advances in transformation-based part of speech tagging. Proceedings of the Twelfth National Conference on Artificial Intelligence (pp. 722-727). Menlo Park, CA: AAAI Press.

Church, K.W., & Hanks, P. 1989. Word association norms, mutual information and lexicography. Proceedings of the 27th Annual Conference of the ACL (pp. 76-83). New Brunswick, NJ: ACL.

Frank, E., & Hall, M. 2001. A simple approach to ordinal classification. Proceedings of the Twelfth European Conference on Machine Learning (pp. 145-156). Berlin: Springer-Verlag.

Hatzivassiloglou, V., & McKeown, K.R. 1997. Predicting the semantic orientation of adjectives. Proceedings of the 35th Annual Meeting of the ACL and the 8th Conference of the European Chapter of the ACL (pp. 174-181). New Brunswick, NJ: ACL.

Hatzivassiloglou, V., & Wiebe, J.M. 2000. Effects of adjective orientation and gradability on sentence subjectivity. Proceedings of 18th International Conference on Computational Linguistics. New Brunswick, NJ: ACL.

Hearst, M.A. 1992. Direction-based text interpretation as an information access refinement. In P. Jacobs (Ed.), Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval. Mahwah, NJ: Lawrence Erlbaum Associates.

Landauer, T.K., & Dumais, S.T. 1997. A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211-240.

Santorini, B. 1995. Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd revision, 2nd printing). Technical Report, Department of Computer and Information Science, University of Pennsylvania.

Spertus, E. 1997. Smokey: Automatic recognition of hostile messages. Proceedings of the Conference on Innovative Applications of Artificial Intelligence (pp. 1058-1065). Menlo Park, CA: AAAI Press.

Tong, R.M. 2001. An operational system for detecting and tracking opinions in on-line discussions. Working Notes of the ACM SIGIR 2001 Workshop on Operational Text Classification (pp. 1-6). New York, NY: ACM.

Turney, P.D. 2001. Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning (pp. 491-502). Berlin: Springer-Verlag.

Wiebe, J.M. 2000. Learning subjective adjectives from corpora. Proceedings of the 17th National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press.

Wiebe, J.M., Bruce, R., Bell, M., Martin, M., & Wilson, T. 2001. A corpus study of evaluative and speculative language. Proceedings of the Second ACL SIG on Dialogue Workshop on Discourse and Dialogue. Aalborg, Denmark.


Repository Staff Only: item control page