The present invention relates to computer-implemented data processing and, more particularly, to data clustering techniques for such applications.
In the era of big data, data processing applications, such as data mining, are benefiting both commercial activity and people's daily lives. Clustering, in which a set of data is organized into multiple subsets (a.k.a. data clusters) based on one or more data characteristics or attributes, plays a critical role in many data mining applications. In general, the larger the set of data, the greater the volume of computations and the greater the transmission bandwidth involved in implementing a data clustering algorithm.
It is known to implement a conventional data clustering algorithm, such as the K-means data clustering algorithm, on a heterogeneous platform having multiple processors of different types operating in parallel, such as a central processing unit (CPU) and multiple graphics processing units (GPUs), to try to perform data clustering in a reasonable amount of time and at a reasonable cost. Unfortunately, for large sets of data, such solutions still take prohibitively long to execute. Thus, it would be advantageous to have a more efficient data clustering method.