The exemplary embodiment relates to text processing. It finds particular application in connection with a system and method for identifying lexico-syntactic patterns in text which can be expressed as rules for identifying two or more named entities in a specific semantic relationship.
A named entity is a group of one or more words (a text element) that identifies an entity by name. For example, named entities may include persons (such as a person's given name or role), organizations (such as the name of a corporation, institution, association, government or private organization), places (locations) (such as a country, state, town, geographic region, a named building, or the like), artifacts (such as names of consumer products, such as cars), temporal expressions, such as specific dates, events (which may be past, present, or future events, such as World War II; The 2012 Olympic Games), and monetary expressions. Named entities are typically capitalized in use to distinguish the named entity from an ordinary noun.
Named entities are of great interest for the task of information extraction in general, and for many other text processing applications. Identifying a group of words as a named entity can provide additional information about the sentence in which it is being used. Techniques for recognizing named entities in text typically rely on a lexicon which indexes entries that are named entities as such, and may further apply grammar rules, such as requiring capitalization, or use statistical analysis, to confirm that the group of words should be tagged as a named entity.
It is often desirable to identify two or more named entities in a particular semantic relationship. Extraction of semantic relations has many applications in information extraction, in particular fact extraction, question-answering, information retrieval, semantic network constructions, ontology building, and the like. For example, a user may wish to know when (i.e., seeking a DATE named entity) or where (a PLACE named entity) a particular event (an EVENT named entity) is to take place. Or, the user may wish to know who (a PERSON-NAME named entity) is the president of a particular company (an ORGANIZATION named entity). While a search could be performed for sentences which include the known named entity, a large number of sentences may be retrieved, most of which are not responsive to the semantic relationship which is the basis of the user's question. It has been proposed that certain regular patterns in word-usage can reflect underlying semantic relations. These patterns are referred herein as lexico-syntactic patterns. However, since there a large number of ways in which the semantic relation may be expressed in natural language text, it would be difficult and time consuming to establish lexico-syntactic patterns which a search engine could use reliably for identifying such semantic relations.
The exemplary embodiment provides a computer implemented system and method for automatically generating lexico-syntactic patterns, which can be used to extract semantic relations between named entities.