This specification relates to extracting class-instance pairs from document text.
A class-instance pair is made up of a class name corresponding to a name of an entity class and an instance name corresponding to an instance of the entity class. The instance of the entity class has an “is-a” relationship with the entity class; in other words, the instance of the entity class is an example of the entity class. An example class-instance pair is the pair (food, pizza), because pizza is a food.
Class-instance pairs are used in a variety of applications including, for example, knowledge base generation and query expansion. A knowledge base is a specialized database that includes information about entities and relationships between entities. Class-instance pairs are used in knowledge bases, for example, to determine relationships between entities. Query expansion occurs when additional words and phrases are added to a user search query before the search query is submitted to a search engine. The additional words and phrases are related to words and phrases in the user search query. Class-instance pairs are used in query expansion, for example, by adding instance names for class names occurring in a user query, or adding class names for instance names occurring in a user query.
Class-instance pair extraction systems apply extraction patterns to text and extract class names and entity names that match the patterns in the text. However, some patterns are overly general, and can result in false identifications of class-instance pairs. Some systems use additional data to filter out false class-instance pairs. However, this filtering is imprecise and can result in the removal of true class-instance pairs as well as false class-instance pairs.