1. Field of the Invention
This invention is related to the field of microprocessors and, more particularly, to superscalar microprocessors including multiple execution units that are optimized for performing DSP functions.
2. Description of the Relevant Art
Superscalar microprocessors achieve high performance by simultaneously executing multiple instructions in a clock cycle and by specifying the shortest possible clock cycle consistent with the design. As used herein, the term "clock cycle" refers to an interval of time during which the pipeline stages of a microprocessor perform their intended functions. Memory elements (such as registers and arrays within the microprocessor) capture data values according to a clock signal which defines the clock cycle. For example, memory elements may capture their data values based upon a rising or falling edge of the clock signal.
Superscalar microprocessor manufacturers often design microprocessors according to the x86 microprocessor architecture. Due to the widespread acceptance in the computer industry of the x86 microprocessor architecture, superscalar microprocessors designed to execute x86 instructions may be suitable for use in many computer system configurations. The x86 instruction set is an example of a complex instruction set computer (CISC) instruction set. Certain CISC instructions are defined to perform complex operations which may require multiple clock cycles to complete. For example, a CISC instruction may utilize a memory operand (i.e. an operand value stored in a memory location as opposed to a register). Fetching the operand from memory may require several clock cycles prior to execution of the instruction upon the operand value. Additionally, a CISC instruction may specify several results to be stored in several different storage locations. Since execution units within a superscalar microprocessor are capable of conveying a finite number of results during a clock cycle, these several results add complexity. The number of results an instruction specifies may affect the number of clock cycles required to execute the instruction. Finally, certain mathematical X86 instructions such as divide and multiply instructions may take numerous processor clock cycles to execute, particularly if they involve memory operands.
Computer systems employing x86 microprocessors also often employ discrete digital signal processors (DSPs). The DSPs are typically included within multimedia devices such as sound cards, speech recognition cards, video capture cards, etc. The DSPs function as coprocessors, performing complex mathematical computations demanded by multimedia devices and other signal processing applications more efficiently than general purpose microprocessors. Microprocessors are typically optimized for performing integer operations upon values stored within a main memory of a computer system. While DSPs perform many of the multimedia functions, the microprocessor manages the operation of the computer system and executes the application programs.
Digital signal processors include execution units which comprise one or more arithmetic logic units (ALUs) coupled to hardware multipliers which implement complex mathematical algorithms in a pipelined manner. The instruction set primarily comprises DSP-type instructions (i.e. instructions optimized for the performance of complex mathematical operations) and also includes a small number of non-DSP instructions. The non-DSP instructions are in many ways similar to instructions executed by microprocessors, and are necessary for allowing the DSP to function independent of the microprocessor.
The DSP is typically optimized for mathematical algorithms such as correlation, convolution, finite impulse response (FIR) filters, infinite impulse response (IR) filters, Fast Fourier Transforms (FFTs), matrix correlations, and inner products, among other operations. Implementations of these mathematical algorithms generally comprise long sequences of systematic arithmetic/multiplicative operations. These operations are interrupted on various occasions by decision-type commands. In general, the DSP sequences are a repetition of a very small set of instructions that are executed 70% to 90% of the time. The remaining 10% to 30% of the instructions are primarily boolean/decision operations.
As computer systems include more multimedia devices and capabilities, the mathematical computation performed within the computer system also increases. While computer systems have evolved to include multimedia functions, microprocessor performance has continued to increase. Still further, the number of transistors included within microprocessor designs continues to increase with continued improvements in semiconductor fabrication technology. It is desirable to integrate DSP functionality into the microprocessor to handle the increased computational demands of modern computer systems and to simplify programming.
However, as stated previously, DSP functions tend to require extensive mathematical computations. The instructions involved in these computations may each require numerous clock cycles for execution. If a general purpose superscalar microprocessor is employed to handle the DSP functionality, and particularly if the superscalar microprocessor employs distributed reservation stations, bottlenecks can occur if one of the execution units is burdened with a majority of the instructions that require numerous cycles for completion. This condition can further cause other execution units to stall.