1. Field
The present invention relates generally to computational analyses and, more specifically, to facilitating targeted analysis via graph generation based on an influencing parameter.
2. Description of the Related Art
Often people wish to draw inferences based on information contained in, and distributed among, relatively large collections of documents, e.g., substantially more documents than they have time to read or the cognitive capacity to analyze. Certain types of inferences implicate relationships between those documents. For example, it may be useful to organize documents by the subject matter described in the documents, sentiments expressed in the documents, or topics addressed in the documents. In many cases, useful insights can be derived from such organization, for example, discovering taxonomies, ontologies, relationships, or trends that emerge from the analysis. Examples might include organizing restaurants based on restaurant reviews, organizing companies based on content in company websites, organizing current events or public figures based on new stories, and organizing movies based on dialogue.
One family of techniques for making such inferences is computational linguistic analysis of text, such as unstructured text, within the documents of a corpus, e.g., with natural language processing techniques, like those based on distributional semantics. Computers are often used to perform semantic similarity analyses within corpora to gauge document pair-wise similarity of the documents according to various metrics, or pair-wise measures of relationships between entities, topics, terms, or sentiments discussed in the documents, which may be crafted to yield results like those described above. Through the sophisticated use of computers, inferences that would otherwise be impractical are potentially attainable, even on relatively large collections of documents.
In some cases, a graph may represent relationships between objects indicated in (e.g., named entities mentioned in) a collection of documents (e.g., one or more corpora). Objects may be text or referents of the text, e.g., named entities. The nodes of the graph may represent the objects, and the edges may represent the relationships between objects. The relationships may be determined based on the frequency of terms in text describing the respective objects, where the number of edges linking such graph nodes, the edge weights, and distribution of such edges are based on the frequency of the terms in the plain text. In some cases, variation in text lengths, the use of specific jargon, or other factors can relatively strongly influence the topology of the graph. In some cases, such influence may undermine the explanatory power of the graph by relegating certain objects to a position more marginal than is appropriate because of a poor description, the use of uncommon words in the text describing the respective objects, etc., in the underlying text. As a result, misleading text may negatively affect the representation of the objects in the collection of documents.