Parallel processing is often implemented by a processor to optimize processing applications, for example, by a digital signal processor (DSP) to optimize digital signal processing applications. A processor can operate as a single instruction, multiple data (SIMD), or data parallel, processor to achieve parallel processing. In SIMD operations, a single instruction is sent to a number of processing elements of the processor, where each processing element can independently perform the same operation on different data. A growing demand for continually higher throughput and increased performance has also led to SIMD-within-a-register (SWAR), where the processing elements can operate on multiple sets of data within their associated registers. For example, a single 32-bit register may include four 8-bit data, eight 4-bit data, or three 10-bit data, each of which can be operated on in parallel by a single processing element.
Although SWAR is relatively inexpensive to implement in a processor's hardware, SWAR poses challenges from a programming perspective. For example, SWAR programming typically necessitates intrinsics, inline assembly, and/or specialized vector data types (such as float2, int4, short4, etc.) from a high level language such as C/C++, which are not part of the ISO C or C++ standards. Because such programming options (specialized vector data types, intrinsics, and/or inline assembly) are processor specific, SWAR programming presents difficulty in porting legacy code. Further, since SWAR programming adds an additional level of parallel processing on a vector processor, conventional processors burden the programmer with ensuring that the processor recognizes the two levels of parallel operation (two-way parallelism): one level of parallel processing within the processing elements (utilizing SWAR) and another level of parallel processing across the processing elements of a vector unit of the processor. Accordingly, although existing processor architectures for performing parallel processing, and associated methods, have been generally adequate for their intended purposes, they have not been entirely satisfactory in all respects.