1. Technical Field
The invention relates to the analysis of information on a computer network. More particularly, the invention relates to a method and apparatus for efficiently solving combinatorial optimization problems on a computer network.
2. Description of the Related Art
There exists continued interest in developing more effective approaches for processing combinatorial optimization problems. Likewise, there is much interest in developing computationally efficient methods for solving co-clustering problems, a special type of combinatorial optimization problem, or even more generally, other classes of problems that involve search through extremely large sets. The co-clustering problem involves simultaneously clustering two finite sets by maximizing the mutual information between the clusterings. Co-clustering has found application in many areas, particularly statistical natural language processing and bio-informatics, among others.
Co-clustering is the problem of clustering the possible values of two variables, such as the input and target variables of a prediction problem, so that the clusters are maximally predictive of each other. For example, terms and documents may be clustered to produce groups of terms that characterize groups of documents. Active genes detected in microarray experiments can be co-clustered with environmental conditions in which they are clarifying the relationship between the two.
Co-clustering maximizes mutual information between the clusterings. This implies that when co-clustering is restricted to the special case of clustering the possible values of a single variable, the resulting clusters form a maximally predictive feature set of any given cardinality for predicting the second variable. Simultaneously clustering the second variable is often of interest in its own right; e.g., for summarizing the relationship between the two variables, but even when it is not, co-clustering is usually much faster computationally than ordinary clustering, and produces similar results for the variable of interest.
Given the explosion of data and the need for processing that data, It would be advantageous to provide a fast, yet highly effective and efficient method for operation on a computer network for solving combinatorial optimization problems, including the co-clustering problem.
It further would be advantageous to provide such a method comprising an algorithm on a computer network for solving the generalized combinatorial optimization problem as well as the co-clustering problem, where the results of the process are superior and require significantly fewer computations than other solutions, including, the greedy optimization process, i.e., “GR,” the simulated annealing process, i.e., “SA,” and another algorithmic solution by Dhillon et al directed to co-clustering, i.e., “DC”.