Clustering is a data analysis technique that can assist in extracting knowledge from data sets. Clustering can be thought of generally as a process of organizing objects into groups whose members are similar in some way. A cluster is a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters. There are numerous areas where the quantity of data does not lend itself to human analysis. Accordingly, computing systems and clustering algorithms are used to learn about the data and assist in extracting knowledge from the data. These algorithms are unsupervised learning algorithms that are executed to extract knowledge from the data. Examples of clustering can include the K-means algorithm (See, J. B. MacQueen (1967): “Some Methods for classification and Analysis of Multivariate Observations, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability”, Berkeley, University of California Press, 1:281-297); Fuzzy c-means (FCM) algorithm (See, J. C. Dunn (1973): “A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters”, Journal of Cybernetics 3: 32-57); and model-based algorithms. Clustering is useful to interpret data, because data is being created at a pace at which computers without clustering cannot keep up. Moreover, a significant portion of data is not labeled.
Clustering has been used in the analysis of large data sets, e.g., high-throughput messenger RNA (mRNA) expression profiling with a microarray, which is enormously promising in the areas of cancer diagnosis and treatment, gene function identification, therapy development and drug testing, and genetic regulatory network inference. However, such a practice is inherently limited due to the existence of many uncorrelated genes with respect to sample or condition clustering, or many unrelated samples or conditions with respect to gene clustering.