Parallel processing is often implemented by a processor to optimize processing applications, for example, by a digital signal processor (DSP) to optimize digital signal processing applications. A processor can operate as a single instruction, multiple data (SIMD), or data parallel, processor to achieve parallel processing. In SIMD operations, a single instruction is sent to a number of processing elements of the processor, where each processing element can perform the same operation on different data.
In vector processing, “stride” refers to the incremental step size of each element, which may or may not be the same as the element size. For example, an array of 32-bit (4 byte) elements may have a stride of 4 bytes, particularly on a processor with a 32-bit data word size. This is referred to as a unity stride. A non-unity stride occurs when one item is accessed for every N elements. For example, with a stride of four, every fourth WORD is accessed.