1. Field of the Invention
The present invention relates generally to processing systems and, more specifically to processing systems with multi-core processing systems.
2. Background Art
Single instruction multiple data (SIMD) architectures, for example SIMD digital signal processor architectures, have an arithmetic logic unit (ALU) for performing operations on arrays stored in memory. SIMD architectures can also have a plurality of ALUs for performing the same or similar operations to accelerate the execution of an instruction. For example, in an SIMD architecture using M ALUs, if an instruction calls for two arrays of N elements to be added together, the instruction can execute M operations per iteration. Thus, an instruction can be performed M times faster then using a single ALU to perform operations.
However, when N is not an integer multiple of M, one iteration will require less then M ALUs. Conventional SIMD architectures use code generated by a compiler or written in assembly code to address the case when N is not an integer multiple of M. For example, code can change array size such that all iterations have M elements by adding elements to arrays. However, such code is complex and introduces additional overhead. These problems can become more significant as more ALUs are provided. For example, as SIMD architectures provide more ALUs, it is more common that N is not an integer multiple of M. Furthermore, complex code must account for the additional scenarios presented by additional ALUs.
Thus, there is a need in the art for a means to perform operations in SIMD architectures having a plurality of ALUs that can, for example, handle cases where N is not an integer multiple of M, without the need for complex code or additional overhead associated with conventional means.