1. Field of the Invention
The present invention relates to the field of electronic hardware used for processing multimedia content such as digitally encoded signals. More specifically, the present invention relates to a data path architecture that can be used for a multimedia processor and is capable of performing high speed operations on operands of various data types.
2. Related Art
Multimedia processors (often called "coprocessors") are more and more becoming indispensable components of every computer system or electronic device that processes multimedia content. Multimedia content can be audio/visual material that is digitally encoded using any number of different encoding standards, such as MPEG (Motion Picture Expert Group) or DV (Digital Video). Multimedia processors are used to digitally encode the digital multimedia content in order to reduce the amount of computer resources required to both store and transmit the digital content. Multimedia processors are also used to digitally decode the encoded multimedia content for rendering on a display screen and/or a speaker system so that the content can be interpreted by a user or viewer. In addition to being used in a computer system, a multimedia processor can also be used in an embedded system within an electronic device, such as within a digital video disk (DVD) player, a compact disk (CD) player or other consumer electronic device that can process audio/visual content.
Multimedia processors, in addition to being useful for processing multimedia content, can also be used to support other processes such as in real-time applications (e.g., flight simulators, speech recognition, video teleconferencing, computer games, streaming audio/video etc.). It is appreciated that the overall system performance of the multimedia processor is heavily dependent on the speed and architecture of the internal data path of the processor. Typically, the faster the data path can process instructions, and thereby process data, the more desirable the multimedia processor. For instance, processing digital images at 30 frames/second requires the processor to perform nearly 2.2 million multiply operations per second. Therefore, it would be advantageous to design a fast data path architecture that occupies smaller areas on the integrated circuit (IC) chip and that consumes less power.
To achieve real-time processing of media signals, architectural enhancements are necessary in order to alleviate the pressure for performance that is demanded of modem systems and technology. Enhancements to the existing instruction set first came as a result of performance demand that originated from specific computer applications such as graphics applications. Soon after, the enhancements appeared in general purpose processors such as the Intel MMX processor and this event reflected a change in the computational environment and; specifically, a shift towards media processing. These extensions operate on the multiple-data values under the control of a single instruction (SIMD). In most of these processors, data is packed into 64-bit registers in one of the general-purpose register files, reflecting their 64-bit adherence to the 64-bit architectural world. However, this 64-bit architecture is limited in data width and therefore not well suited for high performance graphics processing environments.
In multimedia applications, processor data paths use multiplier circuits to perform a wide range of functions such as Inverse Discrete Cosine (IDCT), Fast Fourier Transforms (FFT), and Multiply Accumulate (MAC) on 8-bit, 16-bit, and 32-bit signed and unsigned operands. However, multipliers that are able to process wide data formats typically consume extra processing cycles to perform the multiplication operation. Therefore, prior art data paths that include multiplier circuits typically have more pipestages in their execution phase to accommodate the wide data format multiply operations. Multiply instructions of these prior art processors require additional execution time to complete thereby consuming valuable processing time. The longer execution phase also acts to reduce the efficiency of other operations that only require one or two execution pipestages for completion. It would be advantageous to provide a more efficient data path that is also able to efficiently perform wide data format multiply operations.
One particular prior art multiplication circuit exists within the Intel MMX processor. This multiplication circuit performs 32-bit multiplication using a 16-bit multiplication circuit that is required to perform two iterations. If larger bit multiplication operations are required, then more iterations are performed. The tradeoff selected in this multiplier design requires that 8-bit multiplication not be supported otherwise too many iterations would be required to support larger bit operations. Since two iterations are required for 32-bit, this multiplication circuit is not able to accept new operands each clock cycle, but rather accepts new operands only every other cycle thereby drastically reducing its data throughput capacity. In another particular example, the Altivec processor of Motorola provides two separate multiplier circuits for large-bit multiply operations, e.g., one circuit for 8-bit and a second circuit for 16-bit. However, this approach is disadvantageous because it includes redundant hardware that increases area and power requirements of the processor. It would be advantageous to provide a circuit capable of large-bit multiply operations having high data throughput that does not have substantial hardware redundancy.
Moreover, in multimedia applications there are several specially adapted multimedia instructions that are useful for processing packed data types, such as those that represent encoded pixels or encoded audio data. Like the multiply operations, these specially adapted multimedia instructions often require the data path of a media processor to have extra pipestages to accommodate the instruction execution. It would be advantageous to provide a more efficient data path that is also able to efficiently process these specially adapted multimedia instructions.