The present invention relates to information extraction. In particular, the present invention relates to systems and methods for performing information extraction.
Many databases, web pages and documents exist that contain a large amount of information. With such a large amount of existing information, many methods have been used in order to gather relevant information pertaining to a particular subject. Information extraction refers to a technique for extracting useful information from these information sources. Generally, an information extraction system extracts information based on extraction patterns (or extraction rules).
Manually writing and developing reliable extraction patterns is difficult and time consuming. As a result, many efforts have been made to automatically learn extraction patterns from annotated examples. In some automatic learning systems, natural language patterns are learned by syntactically parsing sentences and acquiring sentential or phrasal patterns from the parses. Another approach discovers patterns using syntactic and semantic constraints. However, these approaches are generally costly to develop. Another approach uses consecutive surface string patterns for extracting information on particular pairs of information. These consecutive patterns only cover a small amount of information to be extracted and thus do not provide sufficient generalization of a large amount of information for reliable extraction.
Many different methods have been devised to address the problems presented above. A system and method for accurately and efficiently learning patterns for use in information extraction would further address these and/or other problems to provide a more reliable, cost effective information extraction system.