GATE: A Unicode-based Infrastructure Supporting Multilingual Information Extraction
2003) GATE: A Unicode-based Infrastructure Supporting Multilingual Information Extraction. In Proceedings WORKSHOP on Information Extraction for Slavonic and other Central and Eastern European Languages, Borovets, Bulgaria. (
Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer. |
NLP infrastructures with comprehensive multilingual support can substantially decrease the overhead of developing Information Extraction (IE) systems in new languages by offering support for different character encodings, language-independent components, and clean separation between linguistic data and the algorithms that use it. This paper will present GATE -- a Unicode-aware infrastructure that offers extensive support for multilingual Information Extraction with a special emphasis on low-overhead portability between languages. GATE has been used in many research and commercial projects at Sheffield and elsewhere, including Information Extraction in Bulgarian, Romanian, Russian, and many other languages.
Subjects: | AKT Challenges > Knowledge retrieval AKT Challenges > Knowledge acquisition AKT Challenges > Knowledge publishing |
---|---|
ID Code: | 268 |
Deposited By: | Bontcheva, Dr Kalina |
Deposited On: | 18 September 2003 |
Contact the site administrator at: hg@ecs.soton.ac.uk