A single instruction, multiple data (SIMD) processing system is a class of parallel computing systems that includes multiple processing elements which execute the same instruction on multiple pieces of data. A SIMD system may be a standalone computer or a sub-system of a computing system. For example, one or more SIMD execution units may be used in a graphics processing unit (GPU) to implement a programmable shading unit that supports programmable shading. A SIMD processing system allows multiple threads of execution for a program to execute synchronously on the multiple processing elements in a parallel manner, thereby increasing the throughput for programs where the same set of operations needs to be performed on multiple pieces of data A particular instruction executing on a particular SIMD processing element is referred to as a thread or a fiber. A group of threads may be referred to as a wave or warp.
Processing units, such as GPUs, include processing elements and a general purpose register (GPR) that stores data for the execution of an instruction. In some examples, a processing element executes instructions for processing one item of data, and respective processing elements store the data of the item or the resulting data of the item from the processing in the GPR. An item of data may be the base unit on which processing occurs. For instance, in graphics processing, a vertex of a primitive is one example of an item, and a pixel is another example of an item. There is graphics data associated with each vertex and pixel (e.g., coordinates, color values, etc.).
There may be multiple processing elements within a processor core of the processing element allowing for parallel execution of an instruction (e.g., multiple processing elements execute the same instruction at the same time). In some cases, each of the processing elements stores data of an item in the GPR and reads the data of the item from the GPR even if the data is the same for multiple items.