1. Field of the Invention
Aspects of the disclosure are directed to a parallel method for agglomerative clustering of non-stationary data.
2. Description of the Related Art
Clustering is the grouping of data points into clusters. Practical applications of clustering include unsupervised classification and taxonomy generation, nearest neighbor searching, scientific discovery, vector quantization, text analysis, and navigation.
One common clustering algorithm is the k-means clustering algorithm. The algorithm assumes that the data “objects” to be clustered are available as points (or vectors) in a d-dimensional Euclidean space. The K-means algorithm seeks a minimum variance grouping of data that minimizes the sum of squared Euclidean distances from certain cluster centroids. The popularity of the K-means algorithm can be attributed to its relative ease of interpretation, implementation simplicity, scalability, convergence speed, adaptability to sparse data, and ease of out-of-core (out of the local memory of a single processor) implementation.
A problem with clustering is that it can take significant memory and processing power. For example, the received data points are typically stored in memory and clustered into a single universe of clusters. To speed up the processing of the stored data, parallel processing techniques can be employed to cluster the data. However, if the clustering is being performed by a mobile device and the data is being generated by sensors, such as an accelerometer or a microphone, there may be too much data to store in memory and then process. Instead, the data must be processed “on the fly.”