In many applications, it can be very useful to identify groups or clusters of objects such that objects in the same cluster are similar while objects in different clusters are dissimilar. Such identification of groups is referred to as “clustering.” Clustering has been used extensively to identify similar web-based objects. Web-based objects may include web pages, images, scientific articles, queries, authors, news reports, and so on. For example, when a collection of images is identified by an image search engine, the search engine may want to identify clusters of related images. The search engine may use various well-known algorithms including K-means, maximum likelihood estimation, spectral clustering, and so on. These algorithms generate clusters of homogeneous objects, that is, objects of the same type (e.g., clusters of images only or clusters of web pages only).
Recently, attempts have been made to cluster highly interrelated heterogeneous objects such as images and their surrounding text; documents and terms; customers and their purchased items; articles, authors, and conferences; web users, issued queries, and click-through web pages; and so on. The goal of heterogeneous clustering is to identify clusters of each type of object that is in some way based on the clusters of the other type of object. The use of homogeneous clustering on objects of each type separately may not be an acceptable basis for heterogeneous clustering because the similarities among one type of objects sometimes can only be defined by the other type of objects. One attempt at co-clustering objects of two types tries to extend traditional spectral clustering algorithms using a bipartite spectral graph clustering algorithm to co-cluster documents and terms simultaneously. A similar attempt has been made at co-clustering heterogeneous objects in the field of biology and image processing.
Some attempts have been made at high-order co-clustering, that is, co-clustering objects of more than two data types. In the case of objects of three data types, the objects of a first type and the objects of a second type are each related to the objects of a third or central type. The relationship between objects of the first type and the objects of the central type and the relationship between objects of the second type and the objects of the central type are provided. The goal of the co-clustering is to provide a clustering of the objects of the first type, a clustering of objects of the second type, and a clustering of objects of the central type. One technique for such co-clustering is described in Gao, B., Liu, T., Zheng, X., Cheng, Q., and Ma, W., “Consistent Bipartite Graph Co-Partitioning for Star-Structured High-Order Heterogeneous Data Co-Clustering,” Proc. ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD'05), 2005, pp. 41-50. Although this technique is very effective, it is computationally expensive, especially with large datasets.