Clustering is one of the most useful tasks in data mining applications for discovering groups and identifying interesting distributions and patterns in the underlying data. Clustering algorithms are utilized to partition a given dataset into groups, or clusters, such that the data points in a cluster are more similar to each other than points in different clusters. Partitioning a given dataset into several groups with similar attributes is of interest for various applications. In the retail environment, examples include partitioning a range of stores into groups with similar gross margin dollars, or grouping products based on their weekly sales. Partitioning datasets can simplify sales analysis and forecasting, particularly when data is missing or contains inaccuracies. It is often easier to obtain good forecasts for the aggregate sales from all items in a store or from all items in a product line than for each individual item in the store.
It is generally, desirable to form clusters of similar size containing items with similar attributes. Thus, the main concern in the clustering process is to reveal the organization of patterns into “sensible” groups, which allow the user to discover similarities and differences, as well as to derive useful conclusions about them.
Various clustering algorithms have been developed for partitioning datasets. However, these clustering algorithms typically require iterative optimization techniques. Consequently these algorithms tend to be computationally intensive, and their application not feasible for many practical cases, where suboptimal but fast algorithms are preferred.