The exemplary embodiment relates generally to information retrieval and finds particular application in connection with a system and method which aids users in conducting searches which include named entities and proper nouns.
A named entity or proper noun is a group of one or more words that identifies an entity by name. For example, named entities may include persons (such as a person's given name or title), organizations (such as the name of a corporation, institution, association, government or private organization), locations (such as a country, state, town, geographic region, or the like), artifacts (such as names of consumer products, e.g., vehicle names), specific dates, and monetary expressions. Named entities and proper nouns are typically capitalized in use to distinguish them from ordinary nouns. What distinguishes named entities from proper nouns is generally dependent on the extraction method and lexical resources used in identifying them. For purposes of the present application, both named entities and proper nouns are referred to as “entities”.
Various methods exist for identifying a group of words as a named entity or proper noun. These allow the respective entity to be indexed as such when it occurs in a document or corpus of documents. For example, the lexicon WordNet™ is an on-line resource which can be used to identify a group of words as forming a named entity. However, particularly in some domain dependent applications, the entities may not be available in such a resource. Techniques have been developed for automated recognition of proper nouns in text. These methods generally rely on identification of capitalized words which serve as nouns in sentences, but which are not among the list of common nouns.
When searching for information in a collection of documents, users may wish to specify queries as natural language expressions which include named entities or proper nouns. Users may also be interested in searching for information about specific types of named entities, for example about a person or a company. Moreover, users may want to search for entities that can be named in several ways or by attributes that characterize them. When the searches are performed in large collections of documents, the search results can be voluminous and not always relevant to the query. In some contexts, searches are performed in a collaborative fashion, i.e., conducted by a team of users sharing a common task, such as the demonstration of a hypothetical fact. It would be useful for the members of the team to leverage knowledge on the searches performed by the people with which they collaborate.