Information extraction (IE) is a process for analyzing and extracting structured information from unstructured or semi-structured text. Typically, IE systems are configured to focus on a specific domain or particular types of events, entities, or relationships. For example, an IE system may be constructed to analyze certain news stories, such as mergers or criminal activity, financial reports, legal opinions, press releases, and relationships between entities in a series of email messages. IE analysis results may take multiple forms, such as entity identification (e.g., persons, corporations, organizations), relationship identification (e.g., person-employer, merging companies), and co-reference resolution, which resolves different identities for a common entity (e.g., United States, United States of America, U.S., America).
Many different techniques have been used to analyze text and to particularize IE processes for various domains. However, one characteristic that these techniques have in common is that they involve a manual component, wherein an individual is involved in discovering the patterns and relationships necessary for the IE process to retrieve information from a corpus of text. Each particular manual component influences the outcome of an IE analysis, for example, by setting patterns that will be examined during the analysis. However, manual components also typically introduce inefficiencies and inaccuracies into the overall IE process.