It is generally desired that the classes, when modeled with Gaussian densities for example, can be used to construct a probability density for the data. Additional data obtained in the same way as the original set should be judged highly likely according to the constructed density. Clustering is a fundamental data analysis tool and is the basis for many approaches to pattern recognition. Among other things, this process facilitates analyzing the areas of the data space that are the most concentrated with points, while allowing one to determine which points may be outliers (i.e., data points that result from noise and do not give information about the process or system being modeled). It also forms the basis for a compact representation of the data.
Clustering is usually a very time consuming process requiring many iterative passes over the data. Generally, the clustering problem is handled by a clustering technique such as K-means or LBG (see Y. Linde, A. Buzo, R. M. Gray, “An Algorithm for Vector Quantizer Design,” IEEE Trans. Commun., vol. 28, pp.84-95, January 1980). K-means starts with an initial seed of classes and iteratively re-clusters and re-estimates the centroids. The effectiveness of this method depends on the quality of the seed. LBG does not require a seed, but starts with one cluster for all of the data. Then, it uses a random criterion to generate new centroids based on the current set (initially one). K-means is used after constructing the new set of centroids. The process is repeated on the new set. In K-means, the requirement for a good seed is strong, which means one needs a lot of prior information. The iterative reclusterings are also time consuming. LBG has a random component which makes it potentially unstable in the sense that quite different models can result from two independent LBG clusterings of the same data.
In view of the foregoing, a need has been recognized in connection with improving upon the shortcomings and disadvantages associated with conventional data clustering methods and arrangements.