Cogprints

Semi-Supervised Named Entity Recognition: Learning to Recognize 100 Entity Types with Little Supervision

Nadeau, David (2007) Semi-Supervised Named Entity Recognition: Learning to Recognize 100 Entity Types with Little Supervision. [Thesis]

Full text available as:

[img]
Preview
PDF
592Kb

Abstract

Named Entity Recognition (NER) aims to extract and to classify rigid designators in text such as proper names, biological species, and temporal expressions. There has been growing interest in this field of research since the early 1990s. In this thesis, we document a trend moving away from handcrafted rules, and towards machine learning approaches. Still, recent machine learning approaches have a problem with annotated data availability, which is a serious shortcoming in building and maintaining large-scale NER systems. In this thesis, we present an NER system built with very little supervision. Human supervision is indeed limited to listing a few examples of each named entity (NE) type. First, we introduce a proof-of-concept semi-supervised system that can recognize four NE types. Then, we expand its capacities by improving key technologies, and we apply the system to an entire hierarchy comprised of 100 NE types. Our work makes the following contributions: the creation of a proof-of-concept semi-supervised NER system; the demonstration of an innovative noise filtering technique for generating NE lists; the validation of a strategy for learning disambiguation rules using automatically identified, unambiguous NEs; and finally, the development of an acronym detection algorithm, thus solving a rare but very difficult problem in alias resolution. We believe semi-supervised learning techniques are about to break new ground in the machine learning community. In this thesis, we show that limited supervision can build complete NER systems. On standard evaluation corpora, we report performances that compare to baseline supervised systems in the task of annotating NEs in texts.

Item Type:Thesis
Keywords:named entity recognition, semi-supervised learning, bootstrapping, noise filtering, disambiguation rules, acronym detection
Subjects:Computer Science > Language
Computer Science > Machine Learning
Computer Science > Artificial Intelligence
ID Code:5859
Deposited By: Nadeau, David
Deposited On:10 Dec 2007 21:45
Last Modified:11 Mar 2011 08:57

References in Article

Select the SEEK icon to attempt to find the referenced article. If it does not appear to be in cogprints you will be forwarded to the paracite service. Poorly formated references will probably not work.

Adar, E. (2002) S-RAD A Simple and Robust Abbreviation Dictionary, HP Laboratories Technical Report.

Agbago, Akakpo; Kuhn, R. and Foster, G. (2006) Truecasing for the Portage System. Proc. of International Conference on Recent Advances in Natural Language Processing.

Alfonseca, Enrique and Manandhar, S. (2002) An Unsupervised Method for General Named Entity Recognition and Automated Concept Discovery. Proc. International Conference on General WordNet.

Asahara, Masayuki and Matsumoto, Y. (2003) Japanese Named Entity Extraction with Redundant Morphological Analysis. Proc. Human Language Technology conference - North American chapter of the Association for Computational Linguistics.

Basili, Roberto; Cammisa, M. and Donati, E. (2005) RitroveRAI: A Web Application for Semantic Indexing and Hyperlinking of Multimedia News. International Semantic Web Conference.

Bick, Eckhard (2004) A Named Entity Recognizer for Danish. Proc. Conference on Language Resources and Evaluation.

Bikel, Daniel M.; Miller, S.; Schwartz, R. and Weischedel, R. (1997) Nymble: a High-Performance Learning Name-finder. Proc. Conference on Applied Natural Language Processing.

Black, William J., Rinaldi, F. and Mowatt, D. (1998) Facile: Description of The NE System Used For Muc-7. Proc. Message Understanding Conference.

Bodenreider, Olivier and Zweigenbaum, P. (2000) Identifying Proper Names in Parallel Medical Terminologies. Stud Health Technol Inform. 77. pp. 443-447.

Boutsis, S., Demiros, I. , Giouli, V. , Liakata, M. , Papageorgiou, H. and Piperidis, S. (2000) A system for recognition of named entities in Greek. Proc. International Conference on Natural Language Processing.

Borthwick, Andrew; Sterling, J.; Agichtein, E. and Grishman, R. (1998) NYU: Description of the MENE Named Entity System as used in MUC-7. Proc. Seventh Message Understanding Conference.

Brin, Sergey (1998) Extracting Patterns and Relations from the World Wide Web. Proc. Conference of Extending Database Technology. Workshop on the Web and Databases (workshop).

Carreras, Xavier; Márques, L. and Padró, L. (2003) Named Entity Recognition for Catalan Using Spanish Resources. Proc. Conference of the European Chapter of Association for Computational Linguistic.

Chang, J. T.; Schütze, H. and Altman R.B., (2002), Creating an Online Dictionary of Abbreviations from MEDLINE. Journal of American Medical Informatics Association (JAMIA), 9(6), p.612-620.

Charniak, Eugene. (2001) Unsupervised Learning of Name Structure from Coreference Data. Proc. Meeting of the North American Chapter of the Association for Computational Linguistics.

Chawla, Nitesh V., Bowyer, K. W., Hall, L. O. and Kegelmeyer, W. P. (2002) SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research. 16. pp. 321-357.

Chen, H. H. and Lee, J. C. (1996) Identification and Classification of Proper Nouns in Chinese Texts. Proc. International Conference on Computational Linguistics.

Chinchor, Nancy (1999) Overview of MUC-7/MET-2. Proc. Message Understanding Conference MUC-7.

Chinchor, Nancy; Robinson, P. and Brown, E. (1998) Hub-4 Named Entity Task Definition. Proc. DARPA Broadcast News Workshop.

Cimiano, Philipp and Völker, J. (2005) Towards Large-Scale, Open-Domain and Ontology-Based Named Entity Classification. Proc. Conference on Recent Advances in Natural Language Processing.

Coates-Stephens, Sam (1992) The Analysis and Acquisition of Proper Names for the Understanding of Free Text. Computers and the Humanities. 26. pp. 441-456.

Cohen, William W. (1995) Fast Effective Rule Induction. Proc International Conference on Machine learning.

Cohen, William W. and Fan, W. (1999) Learning Page-Independent Heuristics for Extracting Data from Web Page. Proc. of the International World Wide Web Conference.

Cohen, William and Richman, J. (2001) Learning to Match and Cluster Entity Names. Proc. International ACM SIGIR Conference on Research and Development in Information Retrieval. Mathematical/Formal Methods in IR (workshop).

Cohen, William W. (2004) Minorthird: Methods for Identifying Names and Ontological Relations in Text using Heuristics for Inducing Regularities from Data, http://minorthird.sourceforge.net.

Cohen, William W. and Sarawagi, S. (2004) Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extraction Processes and Data Integration Methods. Proc. Conference on Knowledge Discovery in Data.

Collins, Michael (2002) Ranking Algorithms for Named–Entity Extraction: Boosting and the Voted Perceptron. Proc. Association for Computational Linguistics.

Collins Michael and Singer, Y. (1999) Unsupervised Models for Named Entity Classification. Proc. of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora.

Cucchiarelli, Alessandro and Velardi, P. (2001) Unsupervised Named Entity Recognition Using Syntactic and Semantic Contextual Evidence. Computational Linguistics. 27(1). pp. 123-131.

Cucerzan, Silviu and Yarowsky, D. (1999) Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence. Proc. Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora.

Cunningham, Hamish, Maynard, D., Bontcheva, K., Tablan, V. (2002) GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. Proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics.

Dannélls, Dana (2006) Acronym Recognition: Recognizing Acronyms in Swedish Texts. Master Thesis. Göteborg University.

Da Silva, Joaquim Ferreira; Kozareva, Z. and Lopes, G. P. (2004) Cluster Analysis and Classification of Named Entities. Proc. Conference on Language Resources and Evaluation.

Dimitrov, Marin; Bontcheva, K.; Cunningham H and Maynard, D. (2002) A Light-weight Approach to Coreference Resolution for Named Entities in Text. Proc. Discourse Anaphora and Anaphor Resolution Colloquium.

Doddington, George, Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S and Weischedel, R. (2004) The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation. Proc. Conference on Language Resources and Evaluation.

Downey, Doug, Etzioni, O, and Soderland, S. (2005) A Probabilistic Model of Redundancy in Information Extraction. In Proc. International Joint Conference on Artificial Intelligence.

Etzioni, Oren, Cafarella, M., Downey, D., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D. S. and Yates, A. (2005) Unsupervised Named-Entity Extraction from the Web: An Experimental Study. Artificial Intelligence, 165, pp. 91-134.

Evans, Richard (2003) A Framework for Named Entity Recognition in the Open Domain. Proc. Recent Advances in Natural Language Processing.

Ferro, Lisa; Gerber, L.; Mani, I.; Sundheim, B. and Wilson G. (2005) TIDES 2005 Standard for the Annotation of Temporal Expressions. The MITRE Corporation.

Fleischman, Michael (2001) Automated Subcategorization of Named Entities. Proc. Conference of the European Chapter of Association for Computational Linguistic.

Fleischman, Michael and Hovy. E. (2002) Fine Grained Classification of Named Entities. Proc. Conference on Computational Linguistics.

Florian, Radu; Ittycheriah, A.; Jing H. and Zhang, T. (2003) Named Entity Recognition through Classifier Combination. Proc. Conference on Computational Natural Language Learning.

Frunza, Oana; Inkpen, D. and Nadeau, D. (2005) A Text Processing Tool for the Romanian Language. Proc. of the EuroLAN 2005 Workshop on Cross-Language Knowledge Induction.

Fung, Pascale (1995) A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora. Proc. Association for Computational Linguistics.

Fürnkranz, Johannes (2002) Round Robin Classification. Journal of Machine Learning Research. 2. pp. 721-747.

Gaizauskas, Robert., Wakao, T., Humphreys, K., Cunningham, H. and Wilks, Y. (1995) University of Sheffield: Description of the LaSIE System as Used for MUC-6. Proc. Message Understanding Conference.

Grishman, Ralph and Sundheim, B. (1996) Message understanding conference - 6: A brief history. Proc. International Conference on Computational Linguistics.

Hearst, Marti (1992) Automatic Acquisition of Hyponyms from Large Text Corpora. Proc. International Conference on Computational Linguistics.

Heng, Ji and Grishman, R. (2006) Data Selection in Semi-supervised Learning for Name Tagging. Proc. joint conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics. Information Extraction beyond the Document (Workshop)

Huang, Fei (2005) Multilingual Named Entity Extraction and Translation from Text and Speech. Ph.D. Thesis. Carnegie Mellon University.

Jansche, Martin (2002) Named Entity Extraction with Conditional Markov Models and Classifiers. Proc. Conference on Computational Natural Language Learning.

Kokkinakis, Dimitri (1998), AVENTINUS, GATE and Swedish Lingware. Proc. of Nordic Computational Linguistics Conference.

Kripke, Saul (1982) Naming and Necessity. Harvard University Press.

Landauer, Thomas K. and Dumais, S. T. (1997) A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of the Acquisition, Induction, and Representation of Knowledge. Psychological Review. 104(2). pp. 211-240.

Larkey, L., Ogilvie, P., Price, A. and Tamilio, B. (2000) Acrophile: An Automated Acronym Extractor and Server, In Proceedings of the ACM Digital Libraries conference.

Lee, Seungwoo and Geunbae Lee, G. (2005) Heuristic Methods for Reducing Errors of Geographic Named Entities Learned by Bootstrapping. Proc. International Joint Conference on Natural Language Processing.

Li, Xin., Morie, P. and Roth, D. (2004) Identification and Tracing of Ambiguous Names: Discriminative and Generative Approaches. Proc. National Conference on Artificial Intelligence.

Lin, Dekang (1998). Automatic retrieval and clustering of similar words. Proc. International Conference on Computational Linguistics and the Annual Meeting of the Association for Computational Linguistics.

Lin, Dekang and Pantel, P. (2001) Induction of Semantic Classes from Natural Language Text. Proc. of ACM SIGKDD Conference on Knowledge Discovery and Data Mining.

Ling, Charles X., and Li, C. (1998). Data Mining for Direct Marketing: Problems and Solutions. Proc. International Conference on Knowledge Discovery and Data Mining.

Liu, Bing, Dai, Y., Li, X., Lee W. S. and Yu, P. (2003) Building Text Classifiers Using Positive and Unlabelled Examples. Proc. of the Third IEEE International Conference on Data Mining.

Mann, Gideon S. and Yarowsky, D. (2003) Unsupervised Personal Name Disambiguation. Proc. Conference on Computational Natural Language Learning.

McDonald, David D. (1993) Internal and External Evidence in the Identification and Semantic Categorization of Proper Names. Proc. Corpus Processing for Lexical Acquisition.

May, Jonathan; Brunstein, A.; Natarajan, P. and Weischedel, R. M. (2003) Surprise! What’s in a Cebuano or Hindi Name? ACM Transactions on Asian Language Information Processing. 2(3). pp. 169-180.

Maynard, Diana; Tablan, V.; Ursu, C.; Cunningham, H. and Wilks, Y. (2001) Named Entity Recognition from Diverse Text Types. Proc. Recent Advances in Natural Language Processing.

McCallum, Andrew and Li, W. (2003) Early Results for Named Entity Recognition with Conditional Random Fields, Features Induction and Web-Enhanced Lexicons. Proc. Conference on Computational Natural Language Learning.

Mikheev, Andrei (1999) A Knowledge-free Method for Capitalized Word Disambiguation. Proc. Conference of Association for Computational Linguistics.

Mikheev, A.; Moens, M. and Grover, C. (1999) Named Entity Recognition without Gazetteers. Proc. Conference of European Chapter of the Association for Computational Linguistics.

Minkov, Einat; Wang, R. and Cohen, W. (2005) Extracting Personal Names from Email: Applying Named Entity Recognition to Informal Text. Proc. Human Language Technology and Conference Conference on Empirical Methods in Natural Language Processing.

Nadeau, David (2005a) Balie – Baseline Information Extraction : Multilingual Information Extraction from Text with Machine Learning and Natural Language Techniques. Technical Report. University of Ottawa. http://balie.sourceforge.net/dnadeau05balie.pdf

Nadeau, David (2005b) Création de surcouche de documents hypertextes et traitement du langage naturel. Proc. Computational Linguistics in the North-East.

Nadeau, David and Turney, P. (2005) A Supervised Learning Approach to Acronym Identification. Proc. Canadian Conference on Artificial Intelligence.

Nadeau, David; Turney, P. and Matwin, S. (2006) Unsupervised Named Entity Recognition: Generating Gazetteers and Resolving Ambiguity. Proc. Canadian Conference on Artificial Intelligence.

Nadeau, David and Sekine, S. (2007) A Survey of Named Entity Recognition and Classification. In: Sekine, S. and Ranchhod, E. Named Entities: Recognition, classification and use. Special issue of Lingvisticæ Investigationes. 30(1) pp. 3-26.

Narayanaswamy, Meenakshi; Ravikumar K. E. and Vijay-Shanker K. (2003) A Biological Named Entity Recognizer. Proc. Pacific Symposium on Biocomputing.

Ohta, Tomoko; Tateisi, Y.; Kim, J.; Mima, H. and Tsujii, J. (2002) The GENIA Corpus: An Annotated Research Abstract Corpus in Molecular Biology Domain. Proc. Human Language Technology Conference.

Palmer, David D. and Day, D. S. (1997) A Statistical Profile of the Named Entity Task. Proc. ACL Conference for Applied Natural Language Processing.

Park, Y. and Byrd, R. J. (2001) Hybrid Text Mining for Finding Abbreviations and Their Definitions, Proc. of the 2001 Conference on Empirical Methods in Natural Language Processing.

Palmer, David D. and Day, D. S. (1997) A Statistical Profile of the Named Entity Task. Proc. ACL Conference for Applied Natural Language Processing.

Pasca, Marius (2004) Acquisition of Categorized Named Entities for Web Search. In Proc. Conference on Information and Knowledge Management.

Pasca, Marius; Lin, D.; Bigham, J.; Lifchits, A. and Jain, A. (2006) Organizing and Searching the World Wide Web of Facts—Step One: The One-Million Fact Extraction Challenge. Proc. National Conference on Artificial Intelligence.

Patrick, Jon; Whitelaw, C. and Munro, R. (2002) SLINERC: The Sydney Language-Independent Named Entity Recogniser and Classifier. Proc. Conference on Natural Language Learning.

Pedersen, Ted (2002) A Baseline Methodology for Word Sense Disambiguation. Proc. Third International Conference on Intelligent Text Processing and Computational Linguistics.

Petasis, Georgios, Vichot, F., Wolinski, F., Paliouras, G., Karkaletsis, V. and Spyropoulos, C. D. (2001) Using Machine Learning to Maintain Rule-based Named-Entity Recognition and Classification Systems. Proc. Conference of Association for Computational Linguistics.

Piskorski, Jakub (2004) Extraction of Polish Named-Entities. Proc. Conference on Language Resources an Evaluation.

Poibeau, Thierry (2003) The Multilingual Named Entity Recognition Framework. Proc. Conference on European chapter of the Association for Computational Linguistics.

Poibeau, Thierry (2006) Dealing with Metonymic Readings of Named Entities. Proc. Annual Conference of the Cognitive Science Society.

Poibeau, Thierry and Kosseim, L. (2001) Proper Name Extraction from Non-Journalistic Texts. Proc. Computational Linguistics in the Netherlands.

Popov, Borislav; Kirilov, A.; Maynard, D. and Manov, D. (2004) Creation of reusable components and language resources for Named Entity Recognition in Russian. Proc. Conference on Language Resources and Evaluation.

Pustejovsky, J.; Castao, J.; Cochran, B.; Kotecki, M.; Morrell, M. and Rumshisky, A. (2001) Extraction and Disambiguation of Acronym-Meaning Pairs in Medline, unpublished manuscript.

Radev, Dragomir (1998) Learning Correlations between Linguistic Indicators and Semantic Constraints: Reuse of Context-Dependent Descriptions of Entities. Proc. joint Conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics.

Raghavan, Hema and Allan, J. (2004) Using Soundex Codes for Indexing Names in ASR documents. Proc. Human Language Technology conference - North American chapter of the Association for Computational Linguistics. Interdisciplinary Approaches to Speech Indexing and Retrieval (workshop).

Rau, Lisa F. (1991) Extracting Company Names from Text. Proc. Conference on Artificial Intelligence Applications of IEEE.

Ravin, Yael and Wacholder, N. (1996) Extracting Names from Natural-Language Text. IBM Research Report RC 2033.

Riloff, Ellen and Jones, R (1999) Learning Dictionaries for Information Extraction using Multi-level Bootstrapping. Proc. National Conference on Artificial Intelligence.

Rindfleisch, Thomas C.; Tanabe, L. and Weinstein, J. N. (2000) EDGAR: Extraction of Drugs, Genes and Relations from the Biomedical Literature. Proc. Pacific Symposium on Biocomputing.

Sánchez, David and Moreno, A. (2005) Web Mining Techniques for Automatic Discovery of Medical Knowledge. Proc. Conference on Artificial Intelligence in Medicine.

Santos, Diana; Seco, N.; Cardoso, N. and Vilela, R. (2006) HAREM: An Advanced NER Evaluation Contest for Portuguese. Proc. International Conference on Language Resources and Evaluation.

Schölkopf, Bernhard, Platt, J., Shawe-Taylor, J., Smola, A. J. and Williamson, R. C. (2001) Estimating the support of a High-Dimensional Distribution. Neural Computation, 13, pp. 1443-1471.

Schwab, Ingo and Pohl, W. (1999) Learning User Profiles from Positive Examples. In Proc. of the International Conference on Machine Learning & Applications.

Schwartz, A. and Hearst, M. (2003), A simple algorithm for identifying abbreviation definitions in biomedical texts, In Proceedings of the Pacific Symposium on Biocomputing.

Sekine, Satoshi (1998) Nyu: Description of The Japanese NE System Used For Met-2. Proc. Message Understanding Conference.

Sekine, Satoshi and Isahara, H. (2000) IREX: IR and IE Evaluation project in Japanese. Proc. Conference on Language Resources and Evaluation.

Sekine, Satoshi and Nobata, C. (2004) Definition, dictionaries and tagger for Extended Named Entity Hierarchy. Proc. Conference on Language Resources and Evaluation.

Settles, Burr (2004) Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets. Proc. Conference on Computational Linguistics. Joint Workshop on Natural Language Processing in Biomedicine and its Applications (workshop).

Shen D., Zhang, J., Zhou, G., Su, J. and Tan, C. L. (2003) Effective Adaptation of a Hidden Markov Model-based Named Entity Recognizer for Biomedical Domain. Proc. Conference of Association for Computational Linguistics. Natural Language Processing in Biomedicine (workshop).

Shinyama, Yusuke and Sekine, S. (2004) Named Entity Discovery Using Comparable News Articles. Proc. International Conference on Computational Linguistics.

Smith, David A. (2002) Detecting and Browsing Events in Unstructured Text. Proc. ACM SIGIR Conference on Research and Development in Information Retrieval.

Srihari, Rohini and Li, W. (1999) Information Extraction Supported Question Answering. Proc. Text Retrieval Conference.

Steinberger, Ralf and Pouliquen, B. (2007) Cross-lingual Named Entity Recognition. In: Sekine, S. and Ranchhod, E. Named Entities: Recognition, classification and use. Special issue of Lingvisticæ Investigationes. 30(1) pp.135-162.

Swan, Russell and Allan, J. (1999) Extracting Significant Time Varying Features from Text. Proc. International Conference on Information Knowledge Management.

Szarvas, György; Farkas, R and Kocsor, A. (2006) A Multilingual Named Entity Recognition System Using Boosting and C4.5 Decision Tree Learning Algorithms. Discovery Science 2006.

Taghva, K. and Gilbreth, J. (1999), Recognizing acronyms and their definitions, International journal on Document Analysis and Recognition, pp. 191-198.

Terra, Egidio and Clarke, C. (2003) Frequency Estimates for Statistical Word Similarity Measures. Proc. Human Language Technology and North American Chapter of Association of Computational Linguistics Conference.

Thielen, Christine (1995) An Approach to Proper Name Tagging for German. Proc. Conference of European Chapter of the Association for Computational Linguistics. SIGDAT (workshop).

Tjong Kim Sang, Erik. F. (2002) Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition. Proc. Conference on Natural Language Learning.

Tjong Kim Sang, Erik. F. and De Meulder, F. (2003) Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. Proc. Conference on Natural Language Learning.

Tsuruoka, Yoshimasa and Tsujii, J. (2003) Boosting Precision and Recall of Dictionary-Based Protein Name Recognition. Proc. Conference of Association for Computational Linguistics. Natural Language Processing in Biomedicine (workshop).

Tufis, D. and Mason, O. (1998). Tagging Romanian Texts: a Case Study for QTAG, a Language Independent Probabilistic Tagger, Proceedings of the First International Conference on Language Resources and Evaluation.

Turney, Peter (2001) Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. Proc. European Conference on Machine Learning.

Turney, Peter D. (2005) Measuring Semantic Similarity by Latent Relational Analysis. In Proc. Inter-national Joint Conference on Artificial Intelligence.

Tzong-Han Tsai, Richard; Wu S.-H.; Chou, W.-C.; Lin, Y.-C.; He, D.; Hsiang, J.; Sung, T.-Y. and Hsu, W.-L. (2006) Various Criteria in the Evaluation of Biomedical Named Entity Recognition. BMC Bioinformatics. 7(92).

Vilar, David; Xu, J.; D’Haro, L. F. and Ney, H. (2006) Error Analysis of Statistical Machine Translation Output. Proc. Language Resources and Evaluation conference

Wang, Liang-Jyh; Li, W.-C. and Chang, C.-H. (1992) Recognizing Unregistered Names for Mandarin Word Identification. Proc. International Conference on Computational Linguistics.

Wang, Lee, Wang, C., Xie, X., Forman, J., Lu, Y., Ma, W.-Y. and Li, Y. (2005) Detecting Dominant Locations from Search Queries. Proc. International ACM SIGIR Conference.

Whitelaw, Casey and Patrick, J. (2003) Evaluating Corpora for Named Entity Recognition Using Character-Level Features. Proc. Australian Conference on Artificial Intelligence.

Witten, Ian. H.; Bray, Z.; Mahoui, M. and Teahan W. J. (1999) Using Language Models for Generic Entity Extraction. Proc. International Conference on Machine Learning. Text Mining (workshop).

Witten, Ian H. and Frank, E. (2000) Data Mining: Practical machine learning tools with Java implementations, Morgan Kaufmann, San Francisco.

Wolinski, Francis; Vichot, F. and Dillet, B. (1995) Automatic Processing Proper Names in Texts. Proc. Conference on European Chapter of the Association for Computational Linguistics.

Yarowsky, David and Florian, R. (2002) Evaluating Sense Disambiguation across Diverse Parameter Spaces. Journal of Natural Language Engineering. 8(2). pp. 293-310.

Yangarber, Roman; Lin, W. and Grishman, R. (2002) Unsupervised Learning of Generalized Names. Proc. of International Conference on Computational Linguistics.

Yeates, S. (1999), Automatic Extraction of Acronyms from Text. In Third New Zealand Computer Science Research Students' Conference.

Yu, H.; Hripcsak G. and Friedman C. (2002) Mapping abbreviations to full forms in biomedical articles, Journal of the American Medical Informatics Association (9) pp. 262-272.

Yu, Shihong; Bai S. and Wu, P. (1998) Description of the Kent Ridge Digital Labs System Used for MUC-7. Proc. Message Understanding Conference.

Zahariev, M. (2004). A (Acronyms), Ph.D. thesis, School of Computing Science, Simon Fraser University.

Zhu, Jianhan; Uren, V. and Motta, E. (2005) ESpotter: Adaptive Named Entity Recognition for Web Browsing. Proc. Conference Professional Knowledge Management. Intelligent IT Tools for Knowledge Management Systems (workshop).

Zhu, Xingquan, Wu, X. and Chen Q. (2003) Eliminating Class Noise in Large Data-Sets, Proc. of the International Conference on Machine Learning.

Metadata

Repository Staff Only: item control page