Corrections: 1. If you are using Java 7, you might get an "IllegalArgumentException: Comparison method violates its original contract". In this case, add the following switch to your java call "java -Djava.util.Arrays.useLegacyMergeSort=true ..." 2. The README.txt contains an incorrect command to generate precision/recall by relation. The correct command is: java -cp "." edu.uw.cs.multir.main.Main senRel -labelsFile ../annotations/sentential-byrelation.txt -resultsFile ./results
Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations.
Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, Daniel S. Weld.
In Proceedings of the Annual Meeting of the Association for Computation Linguistics (ACL), 2011.
Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web.s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors. Recently, researchers have developed multi- instance learning algorithms to combat the noisy training data that can come from heuristic labeling, but their models assume relations are disjoint . for example they cannot extract the pair Founded(Jobs, Apple) and CEO-of(Jobs, Apple).
This paper presents a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extraction model with a simple, corpus-level component for aggregating the individual facts. We apply our model to learn extractors for NY Times text using weak supervision from Freebase. Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level.