Parallel processing can be implemented in a computer system to achieve faster execution of applications over traditional sequential processing. For example, a single instruction multiple data (SIMD) instruction is an example parallel process where a single instruction is performed simultaneously on multiple data. Such SIMD instructions can help speed up data processing in applications including multimedia, video, audio encoding/decoding, 3-Dimensional (3-D) graphics, and image processing.
In a computer system that supports parallel processing, however, some of the same data elements may be re-used, for example, in several iterations of a signal processing operation (e.g., a graphics operation such as a filtering or convolution operation). For example, to process an image or part of an image, the same graphics instruction and input data may be iteratively applied to a plurality of pixels in the image. For each iteration of the graphics instruction, some of the same pixels from the previous iteration may be used. However, for each iteration, the data elements used by the graphics instructions may be re-loaded from memory, which can reduce the efficiency of the parallel processing architecture executing the graphics instructions.