A cluster analysis, that is, clustering refers to grouping similar data. Whether or not data is similar varies with the definition of a similarity that is given in advance. When the value of each data is represented as a vector, a geometric distance is mainly used to determine the similarity. One of the most representative examples of the geometric distance used to determine the similarity is an Euclidean distance. Meanwhile, a k-means clustering is a technology for grouping the total of n d-dimensional data into k groups. For example, when two-dimensional input data exists, the k-means clustering represents a task to assign a clustered index ranging from 1 to k to each of the two-dimensional input data.
When such a k-means clustering is used, k is directly determined by a user, and the result of clustering may be significantly changed depending on k. Accordingly, the k value is randomly determined without prior information or knowledge about the k value, and thus it is very difficult to determine the k value, and also a wrongful determination of the k value may cause an undesirable result. Since the k-means clustering is an iterative algorithm, a great n, which means the number of pieces of data, or a high order d of dimension of data may require a great amount of execution time. Even with the same k value, the time taken to converge, that is, the entire running time may be changed or the result may be changed depending on the center value initially determined. As such, the efficiency of the conventional k-means clustering varies with k value input, and thus is it not easy to generalize and requires control of a skilled operator, and even the skilled operator has a high chance to fail to continuously provide a constant result.