SIMD processors are best suited to performing the same operation on multiple pieces of data simultaneously. Any processor clock cycles not spent performing data processing operations are underutilizing the processing resources of such a costly parallel processor. Typical types of memory addressing require a conventional SIMD processor to spend clock cycles on address calculations and data formatting.
For example, SIMD processors typically only access (i.e., read or write) data within memory, on memory unit boundaries (e.g., byte, 16-bit word, 32-bit word, or 128-bit word). Any need to access data in less than the conventional data unit size, or on boundaries that are not aligned with memory unit boundaries typically require manipulation of data to access the unaligned data. Although this can often be accomplished using conventional data manipulation techniques (byte shifts, AND-masking, etc.), such manipulation requires processor resources.
Similarly, SIMD processors often need to access data at uniformly spaced addresses. For example, often data needs to be accessed in columns of a matrix. Again, calculation of subsequent addresses requires the use of processor resources.
Accordingly, there is a need for a SIMD processor that is capable of flexibly addressing memory while using limited processor resources.