The present invention relates generally to the field of data clustering, and more particularly to distributed data clustering.
Clustering techniques are widely used by retail and consumer product companies who need to learn more about their customers in order to apply 1-to-1 marketing strategy. By application of clustering techniques, customers are partitioned into groups by their buying habits, gender, age, income level etc. Retail and consumer product companies can then use the cluster information to tailor their marketing and product development strategy to each customer group.
Traditional clustering algorithms can broadly be classified into partitional clustering and hierarchical clustering. Partitional clustering algorithms divide data cases into clusters by optimizing certain criterion function. A well-known representative of this class is the k-means clustering. Hierarchical clustering algorithms proceed by stages that produce a sequence of partitions with each partition being nested into the next partition in the sequence. Notice that no initial values are needed for hierarchical clustering. Hierarchical clustering can be agglomerative and divisive. Agglomerative clustering starts with a singleton cluster (i.e. a cluster that contains one data case only) and proceeds by successively merging that cluster with other clusters. In contrast, divisive clustering starts with one single cluster that contains all data cases and proceeds by successively separating the cluster into smaller ones.