AKT EPrint Archive

Adaptive Information Extraction from Text by Rule Induction and Generalisation

Ciravegna, Dr. Fabio (2001) Adaptive Information Extraction from Text by Rule Induction and Generalisation. In Proceedings 17th International Joint Conference on Artificial Intelligence (IJCAI 2001), Seattle.

Full text available as:

PDF - Requires Adobe Acrobat Reader or other PDF viewer.

(LP)2 is a covering algorithm for adaptive Information Extraction from text (IE). It induces symbolic rules that insert SGML tags into texts by learning from examples found in a userdefined tagged corpus. Training is performed in two steps: initially a set of tagging rules is learned; then additional rules are induced to correct mistakes and imprecision in tagging. Induction is performed by bottom-up generalization of examples in the training corpus. Shallow knowledge about Natural Language Processing (NLP) is used in the generalization process. The algorithm has a considerable success story. From a scientific point of view, experiments report excellent results with respect to the current state of the art on two publicly available corpora. From an application point of view, a successful industrial IE tool has been based on (LP)2. Real world applications have been developed and licenses have been released to external companies for building other applications. This paper presents (LP)2, experimental results and applications, and discusses the role of shallow NLP in rule induction.

Keywords:Natural language processin, adaptive information extraction, rule induction
Subjects:AKT Challenges > Knowledge acquisition
ID Code:118
Deposited By:Brewster, Christopher
Deposited On:27 February 2003
Alternative Locations:http://www.dcs.shef.ac.uk/~fabio/cira-papers.html

Contact the site administrator at: hg@ecs.soton.ac.uk