This invention relates to software, and more specifically a methodology to infer latent or unobserved structure in relational data sets.
Relational data is inherently high dimensional, inherently sparse and often binary. High dimensional in the context of consumer action mining (CAM) means, that users choose from a very large set of objects, each object representing one dimension in “feature space” and objects are chosen by a very large set of consumers, each of which represents one dimension in a “preference space.” Sparse means that typically, one can only observe the interaction of an agent with a very small number of objects, i.e., only a few objects are actually viewed, bought, and so forth, by any agent user out of a catalogue of many more objects. Binary means that generally, one can only observe an interaction between an agent and an object or not. There is often no measure of strength of an interaction, i.e., “how much” a user prefers one object over another.
Previous approaches have employed a two-step strategy. First, reduce the high dimensionality of the dataset and thereby remove the inherent sparsity of the relational dataset, i.e., minimizing some error function or distortion between the original data and the dimension-reduced data performs dimensionality reduction. Second, employ a clustering algorithm on the dimensionality reduced data, i.e., find a segmentation of the dimension-reduced data that maximizes some quality function.
A two-step strategy of dimensionality reduction and subsequent clustering employed by previous approaches can yield to the injection of bias and noise in the data, as a segmentation is not performed on the raw data, but rather on a transformed and compressed representation which is necessarily distorted. While it is clear that this distortion is detrimental to the accuracy of any subsequent clustering technique, it was assumed to be necessary as state of the art clustering algorithms are unable to handle large high-dimensional and sparse data sets.
Therefore, there is a need for improved techniques.