1. Field
The present invention relates generally to derivative graphs and, more specifically, to pivoting from a graph of semantic similarity of documents to a derivative graph of relationships between entities mentioned in the documents or other features (e.g., other features of unstructured text in the documents).
2. Description of the Related Art
Often people wish to draw inferences based on information contained in, and distributed among, relatively large collections of documents, e.g., substantially more documents than they have time to read or the cognitive capacity to analyze. Certain types of inferences implicate relationships between those documents. For example, it may be useful to organize documents by the subject matter described in the documents, sentiments expressed in the documents, or topics addressed in the documents. In many cases, useful insights can be derived from such organization, for example, discovering taxonomies, ontologies, relationships, or trends that emerge from the analysis. Examples might include organizing restaurants based on restaurant reviews, organizing companies based on content in company websites, organizing current events or public figures based on new stories, and organizing movies based on dialogue.
One family of techniques for making such inferences is computational linguistic analysis of text, such as unstructured text, within the documents of a corpus, e.g., with natural language processing techniques, like those based on distributional semantics. Computers are often used to perform semantic similarity analyses within corpora to gauge document pair-wise similarity of the documents according to various metrics, or pair-wise measures of relationships between entities, topics, terms, or sentiments discussed in the documents, which may be crafted to yield results like those described above. Through the sophisticated use of computers, inferences that would otherwise be impractical are potentially attainable, even on relatively large collections of documents.
In some cases, a graph may represent relationships between documents in a collection (e.g., one or more corpora), entities mentioned in the documents, or other features of the documents. The nodes of the graph may represent such documents, entities, or other features, where an edge between two nodes of the graph may denote semantic similarity between respective documents, entities, or other features represented by those two nodes. Typically, such a graph may be used to obtain discrete similarity measurements with respect to the represented documents, entities, or other features. In some cases, however, typical methods fail to reveal other similarities between attributes of the represented documents, entities, or other features (or other information that may not be explicitly indicated by the graph).