Field of the Invention
The present invention relates generally to optimizing the sharing of data between execution threads in a graphics processing unit.
Background
A graphics processing unit (GPU) is a special-purpose integrated circuit optimized for graphics processing operations. A GPU is often incorporated into computing devices (e.g., personal computers, rendering farms or servers, handheld devices, digital televisions, etc.) used for executing applications with demanding graphics processing needs, such as, for example, video game applications.
In order to improve processing efficiency, a GPU will commonly execute parallel threads using Single Instruction, Multiple Data (“SIMD”, or “vector”) instructions in order to achieve data level parallelism. This enables a SIMD processor to perform the same instruction on multiple pixels of data, for example, by running a separate thread of operation for each pixel on an individual SIMD lane. However, the data generated within any one SIMD lane is typically inaccessible to other SIMD lanes without the execution of computationally complex and costly data storage and retrieval instructions.
Accordingly, what is needed is an improved technique for allowing the sharing of data between SIMD lanes.