The invention relates to data analysis on databases, in particular, to data mining in commercial databases, and to clustering customers of businesses under scenarios of different applications.
In commercial practice, businesses wish to identify the characteristics of customers in order to manage them effectively. To conduct business effectively, it is desirable for businesses to mine and analyze the characteristics of customers. Thus, clustering techniques are frequently used to analyze the customers data stored in databases.
Clustering techniques are widely applied to fields of statistics, pattern identification, machine learning, telecom services, etc. With computer techniques and clustering techniques, a large number of data records in a database may be divided into K groups or clusters (where K is an integer). The similarity of two data records in the same cluster is greater than the similarity of two data records in different clusters. The commonly used algorithms in clustering techniques include K-means clustering algorithm and PAM (partitioning around medoids) algorithm, etc.
With a clustering technique, businesses may cluster customer records in a database or divide customer records (or customers) into different groups, then summarize the similar characteristics of customers in each group. Thereby, corresponding services may be provided for different customer clusters.
In some applications, it is assumed that the number (usually denoted as an integer K) of groups generated by clustering is known prior to execution of a clustering process. In some actual applications, the number of clusters is unknown prior to execution of a clustering process. The clustering process is tried with different K values, and finally a best K value is determined according to certain clustering criteria and a clustering result is obtained by running the clustering process with the K value.
The complexity of clustering algorithm depends on the number of data records in database, the number of attributes contained in each data record, the number of clusters K, and whether the value of K is known in advance. The process of performing clustering on customer records in database with clustering techniques often takes hours or even days. It is desirable for businesses to improve the efficiency of clustering customer records in database for specific applications.