1. Field
The present disclosure relates generally to computational linguistics and, more specifically, to techniques for forming topic-influenced document relationship graphs.
2. Description of the Related Art
Often people wish to draw inferences based on information contained in, and distributed among, relatively large collections of documents, e.g., substantially more documents than they have time to read or the cognitive capacity to analyze. Certain types of inferences implicate relationships between those documents. For example, it may be useful to organize documents by the subject matter described in the documents, sentiments expressed in the documents, or topics addressed in the documents. In many cases, useful insights can be derived from such organization, for example, discovering taxonomies, ontologies, relationships, or trends that emerge from the analysis. Examples might include organizing restaurants based on restaurant reviews, organizing companies based on content in company websites, organizing current events or public figures based on new stories, and organizing movies based on dialogue.
One family of techniques for making such inferences is computational linguistic analysis of text, such as unstructured text, within the documents of a corpus, e.g., with natural language processing techniques, like those based on distributional semantics. Computers are often used to perform semantic similarity analyses within corpora to gauge document pair-wise similarity of the documents according to various metrics, or pair-wise measures of relationships between entities, topics, terms, or sentiments discussed in the documents, which may be crafted to yield results like those described above. Through the sophisticated use of computers, inferences that would otherwise be impractical are potentially attainable, even on relatively large collections of documents.