During the course of business activities, an entity can collect and store large amounts of data related to those activities. In order to determine data patterns of importance to the business entity, data mining is used to extract those patterns, thereby transforming the data into useful information to the business entity. As the amount of data collected by business entity increases, the efficiency of automated techniques for analyzing that data should increase in order to allow for timely analysis.
Data mining in a customer relationship management application can contribute significantly to a bottom line of the business entity. For example, rather than randomly contacting a prospect or a customer through a call center or sending mail, the business entity can concentrate its efforts on prospects that are predicted to have a high likelihood of responding to an offer. Data mining techniques, including data clustering, can be used to automatically ascertain segments or groups within a customer data set that have higher likelihoods of responding to offers.
Cluster analysis techniques for data mining generally assign a set of observations of the data into subsets, or clusters, so that the observations of the same cluster are similar in some sense. For a multi-dimensional data set (e.g., a database table having a plurality of columns [dimensions] for each entry), spatial analysis techniques can be used to analyze the multi-dimensional data to determine cluster locations. Analysis techniques such as k-means clustering can be used to determine a centroid of a cluster in the multi-dimensional space. Information about the location of the centroid in the multi-dimensional space can be used to determine general characteristics of data entry points forming the cluster. Such information can then be used by the business entity in making decisions related to the data mining task.
As data sets increase in the number of entries, the number of dimensions for each entry (e.g., the number of columns), or the number of clusters present in a data set, the time required to perform a clustering analysis such as k-means clustering also increases due to an increase in computational complexity. Such increased computation time demands increased computer resources as well as potentially making the data set unavailable for other tasks. It is therefore desirable to improve the efficiency of clustering analysis, thereby reducing those resource demands.