1. Field of the Invention
This invention is related to the field of superscalar microprocessors and, more particularly, to canceling speculatively executed instructions within reorder buffers of superscalar microprocessors.
2. Description of the Relevant Art
Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. As used herein, the term "clock cycle" refers to an interval of time accorded to various stages of an instruction processing pipeline within the microprocessor. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively. The term "instruction processing pipeline" is used herein to refer to the logic circuits employed to process instructions in a pipelined fashion. Although the pipeline may be divided into any number of stages at which portions of instruction processing are performed, instruction processing generally comprises fetching the instruction, decoding the instruction, executing the instruction, and storing the execution results in the destination identified by the instruction.
In order to increase performance, superscalar microprocessors often employ out of order execution. The instructions within a program are ordered, such that a first instruction is intended to be executed before a second instruction, etc. When the instructions are executed in the order specified, the intended functionality of the program is realized. However, instructions may be executed in any order as long as the original functionality is maintained. For example, a second instruction which does not depend upon a first instruction may be executed prior to the first instruction, even if the first instruction is prior to the second instruction in program order. A second instruction depends upon a first instruction if a result produced by the first instruction is employed as an operand of the second instruction. The second instruction is said to have a dependency upon the first instruction. As used herein, the term "program execution sequence" refers to the sequence of instructions which were intended to be executed. An instruction is earlier in the program execution sequence than another instruction if it was intended to be executed before that instruction. An instruction is later in the program execution sequence if it was intended to be executed after the other instruction.
Another hazard of out of order execution occurs when two instructions update the same destination storage location. If the instruction which is second in the original program sequence executes first, then that instruction must not update the destination until the first instruction has executed. Often, superscalar microprocessors employ a reorder buffer in order to correctly handle dependency checking and multiple updates to a destination, among other things. Instructions are stored into the reorder buffer in program order, typically as the instructions are dispatched to execution units (perhaps being stored in reservation stations associated therewith). The results of the instructions are stored into the destinations from the reorder buffer in program order. However, results may be provided to the reorder buffer in any order. The reorder buffer stores each result with the instruction which generated the result until that instruction is selected for storing its result into the destination.
A reorder buffer is configured to store a finite number of instructions, defining a maximum number of instructions which may be concurrently outstanding within the superscalar microprocessor. Generally speaking, out of order execution occurs more frequently as the finite number is increased. For example, the execution of an instruction which is foremost within the reorder buffer in program order may be delayed. Instructions subsequently dispatched into the reorder buffer which are not dependent upon the delayed instruction may execute and store results in the buffer. Out of order execution may continue until the reorder buffer becomes full, at which point dispatch is suspended until instructions are deleted from the reorder buffer. Therefore, a larger number of storage locations within the reorder buffer generally leads to increased performance by allowing more instructions to be outstanding before instruction dispatch (and out of order execution) stalls. Increasing the size of the reorder buffer is called increasing the speculative states of the reorder buffer.
Unfortunately, larger reorder buffers complicate recovery from exceptions, such as branch mispredictions. For the purposes of this disclosure, a mispredicted branch instruction will be used to illustrate an exception. It is understood that other types of exceptions can be handled in a similar manner. In the cases of exceptions or mispredicted branches, the reorder buffer may restore the architecture state of the microprocessor and cancel speculatively executed instructions later in the program execution sequence than the instruction that caused the exception. Each instruction in the reorder buffer has a cancel status bit attached to the instruction that indicates if the instruction has been canceled. When a mispredicted branch is detected, the mispredicted branch instruction is located in the reorder buffer, and the mispredicted branch instruction and any speculative instructions that occur after the mispredicted branch instruction in the program execution sequence are canceled by setting the cancel status bits for those instructions. In a reorder buffer with a large speculative state, the circuitry necessary to set the cancel status bits in the reorder buffer based on a mispredicted instruction is relatively slow. The circuitry is relatively slow because a large number of instructions within the reorder buffer must be evaluated to determine which instructions must be canceled due to the mispredicted branch. For example, in a forty-five entry reorder buffer, the last entry in the reorder buffer will be canceled if any of the forty-four entries before it is a mispredicted branch instruction. The circuitry to detect this condition requires a forty-four input OR function, which relatively slow.
Restoring the state of the microprocessor requires the status bits of the mispredicted instruction to be read from the reorder buffer. The status bits indicate, among other things, whether an instruction is part of a microcode sequence and the type of branch instruction. The actions taken by the reorder buffer to recover from a mispredicted branch instruction depend upon the status information contained in these status bits. To read the status bits from the reorder buffer, the mispredicted branch instruction must be identified and then the status bits downloaded from the reorder buffer instruction storage position allocated to that instruction. The circuitry necessary to multiplex the status bits from a reorder buffer instruction storage position is relatively slow. For example, in a forty-five entry instruction reorder buffer, a 45-to-1 multiplexer is required.
In microprocessors that return multiple results in one clock cycle, multiple exceptions may occur in one clock cycle. It is necessary to prioritize the multiple exceptions to determine which occurred earliest in the program sequence. This prioritization occurs after any exceptions have been detected. In high frequency microprocessors, the time required to detect and prioritize exceptions may exceed the period of a clock cycle.
It is desirable to reduce the delay for: identifying a mispredicted branch instruction, canceling instructions subsequent to the mispredicted branch instruction, obtaining the status bits from the mispredicted branch instruction, and prioritizing multiple mispredicted branch instructions.