Computerized processing of data, and specifically textual data, can be useful in a wide variety of different applications. One important technique in processing data is to find relationships between different sets of textual objects. In other words, the sets of textual objects are sets of words that may, for example, be sets of topics and sets of names. Processing text documents to find co-occurrences of the objects in the sets identifies relationships between the objects. For instance, if a person's name frequently co-occurs with a topic in documents, it may be concluded that the person is an expert on that topic.
In finding these types of relationships, co-occurrence data between terms extracted from the texts is very useful. The general mining task of identifying associative relationships between objects based on co-occurrence data is referred to as object association mining. Specifically, object association mining is an estimation of a joint probability between objects using the co-occurrence data. The specific task of identifying an expert, or a person's area of expertise, is referred to as expert/expertise mining and is but one example of object association mining, although the present discussion is not limited to that example.
Another mining technique that can be of help in identifying relations between objects is referred to as mining latent associations between two sets of objects. By latent associations it is meant that the associative relations are represented in clusters of objects. Similar objects in the two sets are grouped into the same cluster. When such latent associations are identified, the associative relationships between the objects can be better understood.
One current approach to identifying latent associations is to use a separable mixture model (SMM) to mine the latent object associations. By using an SMM, the joint probability distribution between objects is defined as a finite mixture model. Each component of the mixture model corresponds to a soft cluster of objects. A SMM works well in situations where the co-occurrence data reflects only one kind of co-occurrence among the two sets of objects. That is, the SMM works well if the data contains only one kind of co-occurrence relationship between objects in a first set of objects X and objects in a second set of objects Y.
More specifically, and again using the example of expert/expertise mining. Assume a first set of objects is a list of topics and a second set of objects is a list of people. Co-occurrences between the objects in those two sets will be identified from a set of textual inputs, such as a set of documents. For instance, if a topic appears in the title of a document, while an author appears in the author section of the document, then a co-occurrence between those two objects is obtained. If other types of co-occurrences are present (such as the topic and person's name appearing in the body of the document) they are not considered by an SMM.
The discussion above is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.