In the context of this application, a resource is defined as any entity represented through some textual description within a computational environment. Since any entity can be so described, the universe of possible resources comprises the universe of all possible entities, including such computational resources as databases and other data sources; queries against databases; publications, webpages, and other textual entities; and people.
Entity ranking refers to the assignment of a relevance value to related objects and entities from different sources. For the search of experts in particular, multiple techniques have been used for this purpose, including probabilistic models and graph-based approaches. Probabilistic models measure associations between experts by detecting their probability distributions with respect to resources such as documents. Graph-based models utilize predefined interconnections between entities to uncover associations.
Topic modeling is a probabilistic generative process designed to uncover the semantics of a collection of documents using a hierarchical Bayesian analysis. The objective of topic modeling is to estimate a probabilistic model of a corpus of documents that assigns high probability to the members of the corpus and also to other “similar” documents. The initial development of topic models conceptualized topics as probabilistic distributions over the words in independent documents. Enhancements and modifications to the basic topic model algorithm that have been proposed include the incorporation of authorship information and the use of multi-level topic arrangements, where topics at one level are considered to be distributions of topics at a lower level. None of the currently proposed techniques, however, combine the ability to model distributions of topics, which we call communities, with the use of authorship information in order to generate authors as distributions over communities. Moreover, the models using authorship information use the concept of “authorship” literally, requiring an author over a piece of text, and do not allow for the use of other structural relationships between resources, such as the textual description of a data source.
Spreading activation is a theory first proposed to model the retrieval characteristics of human memory; it postulates that cognitive units form an interconnected network, and that retrieval is achieved through the spread of activation throughout this network. In recent years, this theory has been successfully applied as a method for associative retrieval in graph-based computer applications.
Most entity ranking approaches concentrate either on the use of probabilistic models over unstructured textual contents, typically using the relationship between experts and their publications, or on the use of graph-theoretic approaches over some predetermined relationships between entities. It seems clear that to achieve better accuracy on relevance rankings with respect to user expectations, it is necessary to combine both the unstructured and structured information within a single framework, and to enable the modeling of communities of resources. Accordingly, it is desirable to derive systems and methods that fulfill these characteristics and that overcome existing deficiencies in the state of the art.