1. Field of the Invention
This invention relates to superscalar microprocessors and, more particularly, a superscalar microprocessor in which data dependencies between instructions are reduced.
2. Description of the Relevant Art
Microprocessors can be implemented on one or more semiconductor chips. Semiconductor chip technology is increasing circuit densities. Speed within microprocessor are increasing by virtue of scalar computation with superscalar technology being the next logical step in the evolution of microprocessors. The term "superscalar" describes a computer implementation that includes performance by a concurrent execution of scalar instructions. Scalar instructions are the type of instructions typically found in general purpose microprocessors. Using today's semiconductor processing technology, a single microprocessor chip can incorporate high performance techniques that were once applicable only to large scale scientific processors.
Microprocessors run application programs. An application program comprises a group of instructions. In running application programs, microprocessor fetch and execute the instructions in some sequence. There are several steps involved in executing a single instruction, including fetching the instruction, decoding it, assembling the necessary operands, performing the operations specified by the instruction, and writing the results of the instruction to storage. The steps are controlled by a periodic clock signal. The period of the clock signal is the processor cycle time.
The time taken by a microprocessor to complete a program is determined by three factors: the number of instructions required to execute the program; the average number of processor cycles required to execute an instruction; and the processor cycle time. Microprocessor performance is improved by reducing the time taken by the microprocessor to complete the application program, which dictates reducing one or more of these factors.
One way to improve the performance of microprocessors is to overlap the steps of different instructions, using a technique called pipelining. In pipelining, the various steps of instruction execution are performed by independent units called pipeline stages. Pipeline stages are generally separated by clocked registers and the steps of different instructions are executed independently in different pipeline stages. Pipelining reduces the average number of cycles required to execute an instruction, though not the total amount of time required to execute an instruction, by overlapping instructions, thereby allowing processors to handle more than one instruction at a time. Pipelining reduces the average number of cycles per instruction by as much as a factor of 3.
A typical pipelined scalar microprocessor executes one instruction per processor cycle. A superscalar microprocessor reduces the average number of cycles per instruction beyond what is possible in a pipelined scalar processor by allowing concurrent execution of instructions in the same pipeline as well as concurrent execution of instructions in different pipelines. While superscalar processors are simple in theory, there is more to achieving increased performance than simply increasing the number of pipelines. Increasing the number of pipelines makes it possible to execute more than one instruction per cycle, but there is no guarantee that any given sequence of instructions can take advantage of this capability. Instructions are not always independent of one another, but are often interrelated. These interrelationships prevent some instructions from occupying the same pipeline stage. For example, certain instructions are data dependent which means, in one sense, that the data result of one instruction (the data dependent instruction) is dependent upon the data results of another instruction (the data independent instruction). For example, to add together two 64-bit words using a 32-bit ALU, the normal practice is to first add together the least significant 32 bits of the two addends, followed by an addition of the most significant 32 bits of the two addends and the carry that might be generated by the first addition. Performing the addition in this way requires that the results from the first addition instruction (the data independent instruction), in particular, the carry, being known before the second addition instruction (the data dependent) may be started. These relationships between instructions may prevent some instructions from occupying the same pipeline stage.
There is a penalty for executing instructions which are data dependent upon one another. In the example, execution of the second instruction for adding the most significant 32 bits of the two addends must be delayed until the carry of the first addition instruction is available. This delay may degrade processor performance. What is needed is a mechanism for removing dependencies between instructions in order to avoid delays in executing instructions which are dependent upon the data results of other instructions.