creators_name: Turney, Peter D.
creators_name: Littman, Michael L.
type: techreport
datestamp: 2002-07-15
lastmod: 2011-03-11 08:54:57
metadata_visibility: show
title: Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus
ispublished: unpub
subjects: comp-sci-art-intel
subjects: comp-sci-lang
subjects: comp-sci-mach-learn
subjects: comp-sci-stat-model
full_text_status: public
abstract: The evaluative character of a word is called its semantic orientation. A positive semantic orientation implies desirability (e.g., "honest", "intrepid") and a negative semantic orientation implies undesirability (e.g., "disturbing", "superfluous"). This paper introduces a simple algorithm for unsupervised learning of semantic orientation from extremely large corpora. The method involves issuing queries to a Web search engine and using pointwise mutual information to analyse the results. The algorithm is empirically evaluated using a training corpus of approximately one hundred billion words  the subset of the Web that is indexed by the chosen search engine. Tested with 3,596 words (1,614 positive and 1,982 negative), the algorithm attains an accuracy of 80%. The 3,596 test words include adjectives, adverbs, nouns, and verbs. The accuracy is comparable with the results achieved by Hatzivassiloglou and McKeown (1997), using a complex four-stage supervised learning algorithm that is restricted to determining the semantic orientation of adjectives. 
date: 2002
date_type: published
institution: National Research Council Canada
department: Institute for Information Technology
refereed: FALSE
citation:   Turney, Peter D. and Littman, Michael L.  (2002) Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus.  [Departmental Technical Report]    (Unpublished)  
document_url: http://cogprints.org/2322/1/ERB-1094.ps
document_url: http://cogprints.org/2322/5/ERB-1094.pdf