1. Field of the Invention
This invention relates to superscalar microprocessors and, more particularly, a superscalar microprocessor in which speculatively executed instructions are tagged and invalidated.
2. Description of the Relevant Art
Microprocessors can be implemented on one or a very small number of semiconductor chips. Semiconductor chip technology is increasing circuit densities. Speeds within microprocessors are increasing with the use of scalar computation with superscalar technology being the next logical step in the evolution of microprocessors. The term superscalar describes a computer implementation that includes performance by a concurrent execution of scalar instructions. Scalar instructions are the type of instructions typically found in general purpose microprocessors. Using today's semiconductor processing technology, a single microprocessor chip can incorporate high performance techniques that were once applicable only to large scale scientific processors.
Microprocessors run application programs. An application program comprises a group of instructions. In running application programs, microprocessors fetch and execute the instructions in some sequence. There are several steps involved in executing a single instruction, including fetching the instruction, decoding it, assembling the necessary operands, performing the operations specified by the instruction, and writing the results of the instruction to storage. These steps are controlled by a periodic clock signal. The period of the clock signal is the processor cycle time.
The time taken by a microprocessor to complete a program is determined by three factors: the number of instructions required to execute the program; the average number of processor cycles required to execute an instruction; and the processor cycle time. Microprocessor performance is improved by reducing the time taken by the microprocessor to complete the program, which dictates reducing one or more of these factors.
One way to improve the performance of the microprocessor is by overlapping the steps of different instructions, using a technique called pipelining. In pipelining, the various steps of instruction execution are performed by independent units called pipeline stages. Pipeline stages are generally separated by clock registers, and the steps of different instructions are executed independently in different pipeline stages. Pipelining reduces the average number of cycles required to execute an instruction, though not the total amount of time required to execute an instruction, by overlapping instructions and thus permitting the processor to handle more than one instruction at a time. Pipelining reduces the average number of cycles per instruction by as much as a factor of 3. However, when executing a conditional branch instruction, the pipeline may sometimes stall until the result of the conditional branch operation is known (resolved) and the correct next instruction is fetched for execution. This stall is known as branch delay penalty.
A typical pipelined scalar microprocessor executes one instruction per processor cycle. A superscalar microprocessor reduces the average number of cycles per instruction beyond what is possible in a pipelined scalar processor by allowing concurrent execution of instructions in the same pipeline as well as concurrent execution of instructions in different pipelines.
While superscalar processors are simple in theory, there is more to achieving increased performance then simply increasing the number of pipelines. Increasing the number of pipelines makes it possible to execute more than one instruction per cycle, but there is no guarantee that any given sequence of instructions can take advantage of this capability. Instructions are not always independent of one another, but are often interrelated. These interrelationships prevent some instructions from occupying the same pipeline stage. Furthermore, the processor's mechanisms for decoding and executing instructions can make a difference in its ability to discover instructions that can be executing simultaneously.
A program counter (PC) also called an instruction pointer (IP), identifies the memory address of instructions to be fetched from memory and executed. The program counter mechanism for maintaining and updating the program counter value includes an incrementer, a selector, and a register. As each instruction is fetched and decoded, an address of the next sequential instruction is formed by adding the byte length of the current instruction to the value of the program counter using the incrementer and placing this next sequential instruction in the register. When a branch is taken, the address of the target instruction is selected by the selector instead of the incremented value and this target address is placed in the register.
Branch prediction mechanisms are often employed in superscalar microprocessors to predict the outcome of a conditional branch and to have the processor pursue the likely execution path prior to decode and subsequent execution of the conditional branch instruction. At any point within the path of execution, if the processor determines that a prediction was incorrect, the microprocessor backs up in the instruction stream and proceeds down the correct path. There is a penalty for employing branch prediction mechanisms within a microprocessor. The penalty relates to instructions completed after the conditional branch is predicted but before the branch outcome is actually determined. These completed instructions are discarded, after a branch misprediction and the time that the processor spent executing them is wasted.
The prior art has suggested a method for limiting or completely avoiding the penalty associated with branch misprediction. More particularly, the prior art suggests that the superscalar microprocessor pursue both paths (i.e., the sequence of instructions if the conditional branch instruction is taken or the sequence of instructions if the conditional branch instruction is not taken) to ensure that the microprocessor always executes the correct instruction sequence when it determines the outcome of the branch. In this scenario, the microprocessor will simply discard the results of the incorrect path.
To sufficiently pursue both instruction paths at a branch, the microprocessor must have enough resources to pursue both paths, i.e., double the number of decoders and functional units, and so on. While the prior art suggests pursuing both instruction paths at a conditional branch instruction, the prior art fails to teach or fairly suggest a means for implementing and managing dual path instruction execution at a conditional branch.