This item is a Paper in the Data Mining track.
- Zhu, Jun - Tsinghua University
- Nie, Zaiqing - Microsoft Research Asia
- Liu, Xiaojiang - University of Science and Technology of China
- Zhang, Bo - Tsinghua University
- Wen, Ji-Rong - Microsoft Research Asia
Published Version
| PDF (833Kb) |
Abstract
Traditional relation extraction methods require pre-specified relations and relation-specific human-tagged examples. Boot- strapping systems significantly reduce the number of train- ing examples, but they usually apply heuristic-based meth- ods to combine a set of strict hard rules, which limit the ability to generalize and thus generate a low recall. Further- more, existing bootstrapping methods do not perform open information extraction (Open IE), which can identify var- ious types of relations without requiring pre-specifications. In this paper, we propose a statistical extraction framework called Statistical Snowball (StatSnowball), which is a boot- strapping system and can perform both traditional relation extraction and Open IE. StatSnowball uses the discriminative Markov logic net- works (MLNs) and softens hard rules by learning their weights in a maximum likelihood estimate sense. MLN is a general model, and can be configured to perform different levels of relation extraction. In StatSnwoball, pattern selection is performed by solving an l1 -norm penalized maximum like- lihood estimation, which enjoys well-founded theories and efficient solvers. We extensively evaluate the performance of StatSnowball in different configurations on both a small but fully labeled data set and large-scale Web data. Empirical results show that StatSnowball can achieve a significantly higher recall without sacrificing the high precision during it- erations with a small number of seeds, and the joint inference of MLN can improve the performance. Finally, StatSnowball is efficient and we have developed a working entity relation search engine called Renlifang based on it.
Export Record As...
- HTML Citation
- ASCII Citation
- Resource Map
- OpenURL ContextObject
- EndNote
- BibTeX
- OpenURL ContextObject in Span
- MODS
- DIDL
- EP3 XML
- JSON
- Dublin Core
- Reference Manager
- Eprints Application Profile
- Simple Metadata
- Refer
- METS