A sparse matrix is a matrix in which a majority of matrix elements have a value of zero. Sparse matrices are extensively used in such applications such as linear algebra, many engineering disciplines, physics, data mining, and graph analytics. Many of these applications rely on multiplication of a vector by a sparse matrix to yield a new vector. There are thus demands for efficient methods of performing such operations. Parallel processors, such as graphics processors may be used to meet such demands.
Graphics processors, or graphics processing units (GPUs), are highly parallel computation devices. As the name implies, they were originally developed for fast and efficient processing of visual information, such as video. More recently, however, they have been engineered to be more general purpose massively parallel devices. Current GPUs may execute thousands of computations concurrently, and this number is bound to increase with time. Such parallel computations are referred to as threads. In order to reduce hardware complexity (and thus allow more parallel compute units in a chip), GPUs bundle numerous threads together and require them to execute in a single-instruction-multiple-data (SIMD) fashion. That is, the same instructions are executed simultaneously on many distinct pieces of data. Such a bundle of threads is called a wavefront, a warp, or other names. In addition, a given processor, such as a GPU, may include multiple real or virtual processors that may run in parallel, each with its own threads and wavefronts. Each of such virtual processors is called a workgroup, thread block, or other names. A collection of virtual processors or workgroups may share a single real processor.