1. Field of the Invention
This invention pertains in general to a processor architecture and in particular to a processor architecture adapted to perform data-level parallelism.
2. Description of Background Art
Processor architectures have certain mechanisms for exploiting data- and instruction-level parallelism in order to scale processor performance. In short, data-level parallelism occurs when a single processor instruction performs simultaneous operations on multiple data. Instruction-level parallelism, in contrast, occurs when the processor simultaneously executes more than one instruction. These mechanisms have the disadvantage of requiring complex hardware and software for extending processor performance. In addition, the mechanisms generally concentrate more on the instruction level than the data level. Many media operations, however, including those that perform frequency impulse response, discrete cosine transforms, motion estimation, and motion compensation, require substantial amounts of data-level parallelism.
Single-instruction-multiple-data (SIMD) processor instructions can be used to exploit data-level parallelism. A single SIMD instruction operates on multiple data simultaneously. Typical processors have 32- or 64-bit datapaths while typical media operations operate on data requiring only 8- to 16-bit precision. Accordingly, some processors support SIMD through instruction set extensions and datapaths that simultaneously operate on 2 to 4 packed words.
Thus, a certain amount of data-level parallelism can be gained through the use of SIMD extensions to processor instruction sets. These extensions, however, are very hardware intensive. In addition, exploiting the SIMD architecture requires aggressive and complex assembly coding techniques to achieve the data-level parallelism and manage the resulting convoluted memory hierarchies and instruction scheduling requirements. Accordingly, the SIMD extensions cannot be used in the normal high-level language flow of program development because the data parallelism must be coded at a low level. Also, the functionality of the SIMD extensions is controlled by the width of the existing processor datapaths and programming models.
Processors supporting superscalar instruction scheduling exercise parallel functional units by dynamically extracting instruction-level parallelism from the instruction stream. When combined with SIMD data-level parallelism, superscalar processors can perform control flow operations in the instruction stream in parallel. Although these parallel operations can give very high performance, the processors must have very complex hardware. Moreover, such parallelism makes certain assumptions about the temporal and spatial locality of data that do not hold true when processing media data, thus reducing the effectiveness of these techniques in media applications. Finally, a programmer must use very complex programming techniques in order to fully utilize the hardware.
Processors supporting very long instruction word (VLIW) formats explicitly encode instruction parallelism into a very long instruction word. Basically, the VLIW format moves the complexity of extracting instruction-level parallelism from hardware to software. Thus, the use of a VLIW format makes the already complex task of coding data-level parallelism even harder. Another disadvantage of VLIW formats is that code must often be rewritten to support newer versions of the processors.
Accordingly, there is a need for a processor architecture that supports data-level parallelism in order to efficiently execute media operations. Such a processor should also include single scalar processor control to maintain simplicity in both hardware and software.