Clustering is a well known data mining construct used to identify groups of similar objects. Unsupervised clustering splits data purely based on distance between data points, which can make clustering results unreliable.
In many real world applications, some prior knowledge or domain knowledge can be used to constrain or guide a clustering process in order to produce more acceptable data partitions. For example, semi-supervised clustering uses limited prior knowledge together with unlabeled data to achieve better clustering performance. Semi-supervised clustering typically employs two types of prior knowledge, class labels and pairwise constraints, to improve upon results obtained from unsupervised clustering.
One method of performing semi-supervised clustering is nonnegative matrix factorization. Nonnegative Matrix Factorization (NMF) penalizes its objective function using constraints. NMF factorizes an input nonnegative matrix into a product of two new matrices of lower rank. Semi-supervised clustering through matrix factorization has been shown to largely improve clustering accuracy by incorporating prior knowledge into the factorization process.