creators_name: Zhu, Jun creators_name: Nie, Zaiqing creators_name: Liu, Xiaojiang creators_name: Zhang, Bo creators_name: Wen, Ji-Rong type: conference_item datestamp: 2009-04-06 19:08:46 lastmod: 2009-04-07 14:02:11 metadata_visibility: show title: StatSnowball: a Statistical Approach to Extracting Entity Relationships ispublished: pub full_text_status: public pres_type: paper abstract: Traditional relation extraction methods require pre-specified relations and relation-specific human-tagged examples. Boot- strapping systems significantly reduce the number of train- ing examples, but they usually apply heuristic-based meth- ods to combine a set of strict hard rules, which limit the ability to generalize and thus generate a low recall. Further- more, existing bootstrapping methods do not perform open information extraction (Open IE), which can identify var- ious types of relations without requiring pre-specifications. In this paper, we propose a statistical extraction framework called Statistical Snowball (StatSnowball), which is a boot- strapping system and can perform both traditional relation extraction and Open IE. StatSnowball uses the discriminative Markov logic net- works (MLNs) and softens hard rules by learning their weights in a maximum likelihood estimate sense. MLN is a general model, and can be configured to perform different levels of relation extraction. In StatSnwoball, pattern selection is performed by solving an l1 -norm penalized maximum like- lihood estimation, which enjoys well-founded theories and efficient solvers. We extensively evaluate the performance of StatSnowball in different configurations on both a small but fully labeled data set and large-scale Web data. Empirical results show that StatSnowball can achieve a significantly higher recall without sacrificing the high precision during it- erations with a small number of seeds, and the joint inference of MLN can improve the performance. Finally, StatSnowball is efficient and we have developed a working entity relation search engine called Renlifang based on it. date: 2009-04 pagerange: 101-101 event_title: 18th International World Wide Web Conference event_location: Madrid, Spain event_dates: April 20th-24th, 2009 event_type: conference refereed: TRUE citation: Zhu, Jun and Nie, Zaiqing and Liu, Xiaojiang and Zhang, Bo and Wen, Ji-Rong (2009) StatSnowball: a Statistical Approach to Extracting Entity Relationships. In: 18th International World Wide Web Conference, April 20th-24th, 2009, Madrid, Spain. document_url: http://www2009.eprints.org/11/1/p101.pdf