AKT EPrint Archive

GATE: A Unicode-based Infrastructure Supporting Multilingual Information Extraction

Bontcheva, Dr Kalina and Maynard, Dr Diana and Tablan, Mr Valentin and Cunningham, Dr Hamish (2003) GATE: A Unicode-based Infrastructure Supporting Multilingual Information Extraction. In Proceedings WORKSHOP on Information Extraction for Slavonic and other Central and Eastern European Languages, Borovets, Bulgaria.

Full text available as:

PDF - Requires Adobe Acrobat Reader or other PDF viewer.

NLP infrastructures with comprehensive multilingual support can substantially decrease the overhead of developing Information Extraction (IE) systems in new languages by offering support for different character encodings, language-independent components, and clean separation between linguistic data and the algorithms that use it. This paper will present GATE -- a Unicode-aware infrastructure that offers extensive support for multilingual Information Extraction with a special emphasis on low-overhead portability between languages. GATE has been used in many research and commercial projects at Sheffield and elsewhere, including Information Extraction in Bulgarian, Romanian, Russian, and many other languages.

Subjects:AKT Challenges > Knowledge retrieval
AKT Challenges > Knowledge acquisition
AKT Challenges > Knowledge publishing
ID Code:268
Deposited By:Bontcheva, Dr Kalina
Deposited On:18 September 2003

Contact the site administrator at: hg@ecs.soton.ac.uk