1. Field of the Invention
This invention relates to computer systems and microprocessors, and more particularly to a multimedia execution unit incorporated within a microprocessor for accommodating high-speed multimedia applications. The invention further relates to an accumulate function and vector processing implemented within microprocessor-based systems.
2. Description of the Related Art
Microprocessors typically achieve increased performance by partitioning processing tasks into multiple pipeline stages. In this manner, microprocessors may independently be executing various portions of multiple instructions during a single clock cycle. As used herein, the term "clock cycle" refers to an interval of time during which the pipeline stages of a microprocessor perform their intended functions. At the end of the clock cycle, the resulting values are moved to the next pipeline stage.
Microprocessor based computer systems have historically been used primarily for business applications, including word processing and spreadsheets, among others. Increasingly, however, computer systems have evolved toward the use of more real-time applications, including multimedia applications such as video and audio processing, video capture and playback, telephony and speech recognition. Since these multimedia applications are computational intensive, various enhancements have been implemented within microprocessors to improve multimedia performance. For example, some general purpose microprocessors have been enhanced with multimedia execution units configured to execute certain special instructions particularly tailored for multimedia computations. These instructions are often implemented as "vectored" instructions wherein operands for the instructions are partitioned into separate sections or vectors which are independently operated upon in accordance with the instruction definition. For example, a vectored add instruction may include a pair of 32-bit operands, each of which is partitioned into four 8-bit sections. Upon execution of such a vectored add instruction, corresponding 8-bit sections of each operand are independently and concurrently added to obtain four separate and independent addition results. Implementation of such vectored instructions in a computer system furthers the use of parallelism, and typically leads to increased performance for certain applications.
Vectored arithmetic operations such as add and subtract are useful for a variety of multimedia operations. As mentioned above, however, these operations are performed on corresponding portions of different operands. It would also be desirable to have an instruction which performs an addition operation using portions of the same operands. Such an instruction (referred to herein as an "accumulate instruction") would be useful in sum-of-product calculations which are part of the matrix multiply operation commonly used in multimedia applications. It would further be desirable to provide a multimedia execution unit with an efficient hardware implementation of the accumulate instruction.