Parallel processors, such as graphics processors, or graphics processing units (GPUs), are highly parallel computation devices. As the name implies, GPUs were originally developed for fast and efficient processing of visual information, such as video. More recently, however, they have been engineered to be more general-purpose massively parallel devices. Current GPUs may execute thousands of computations concurrently, and this number is bound to increase with time. Such parallel computations are referred to as threads. In order to reduce hardware complexity (and thus allow more parallel compute-units in a chip), GPUs bundle numerous threads together and require them to execute in a single-instruction-multiple-data (SIMD) fashion. That is, the same instructions are executed simultaneously on many distinct pieces of data. Such a bundle of threads is called a wavefront, a warp, or other names.
A kernel is a program, or a portion of a program, containing multiple threads, that executes on a computing device. The multiple threads may be bundled into one or more workgroups, which are also known as threadblocks and other names.