The following exemplary embodiments relate generally to improved operational efficiency of graphics processing units (GPUs), including GPUs employed in systems for data searching, retrieval, organization, interpretation and implementation. The embodiments more particularly find application in connection with systems and methods for conducting searches of electronically stored data, including but not being limited to datasets.
Graphics processing units (GPUs) are becoming increasingly popular in solving computationally challenging problems including but not limited to data mining problems and machine learning operations. With attention to computational complex problems, the k-means algorithm is a well-known clustering algorithm with many applications, including its use in the area of data mining. It is an iterative algorithm in which a single iteration consists of two phases:                1 Assignment phase: Each data point is assigned to one of the k clusters that is the closest in terms of the Euclidean distance between the point and the centroid of the cluster;        2 Update phase: Update all k cluster centroids by computing the arithmetic mean of each cluster.        
The assignment phase and the update phase iterations are then repeated until an acceptable clustering of the data points is accomplished.
To find the initial centroids before the assignment phase of the first iteration can begin, two common initialization methods can be used: (a) the Forgy method, which randomly picks k points from the data and uses them as the initial centroids; and (b) the Random Partition method, which randomly assigns each point to a cluster, updates the cluster centroids, and uses the updated centroids as initial centroids. Both initialization methods are supported by the teachings of the present disclosure.
A number of data mining algorithms have been developed as GPU-based k-means clustering algorithms. Most of these existing GPU-based k-means implementations have been built on top of a parallel computing platform and programming model developed by NVIDIA Corporation of Santa Clara, Calif., called CUDA, which supports general-purpose GPU computing. While the embodiments to be described herein (including k-means clustering) also use CUDA, the present focus is on the more recent NVIDA GPUs designed with the Fermi architecture, rather than the earlier models such as GeForce GTX 280 that has been used as the test GPU in a number of studies such as discussed in the article by Ren Wu, Bin Zhang, and Meichum Hsu, GPU-Accelerated Large Scale Analytics, HP Laboratories, Hewlett-Packard Development Company, L.P. (Mar. 6, 2009).
The introduction of the Fermi architecture has resulted in several new features being added to CUDA that were not previously available. This creates new opportunities for improving existing GPU-based algorithms implemented for and tested on pre-Fermi GPUs. For example, the article by R. Nath, S. Tomov, J. Dongarra, An Improved Magma Gemm for Fermi Graphics Processing Units, International Journal of High Performance Computing Applications 2010, shows how to improve general matrix-matrix multiplications on the Fermi GPUs, which require non-trivial implementation changes to fully exploit the newly introduced hardware features such as increased shared memory, number of registers and etc.
The present inventor has determined additional aspects now available due to the Femi architecture permit for improvements in the operational efficiency of Fermi based and post-Fermi based GPUs.