1. Field of the Invention
This invention is related to the field of superscalar microprocessors and, more particularly, to the storage of speculative register states prior to their storage into a register file. The speculative register states are available to subsequent instructions so as not to stall instruction dispatch.
2. Description of the Relevant Art
Superscalar microprocessors achieve high performance by executing multiple instructions concurrently and by specifying the shortest possible clock cycle consistent with the design. As used herein, a "clock cycle" is an interval of time accorded to various stages of an instruction processing pipeline within the microprocessor to complete their various functions. Memory elements (such as registers and arrays) capture their values according to a clock signal defining the clock cycle.
The number of instructions which may be executed concurrently in many ways defines the performance of a superscalar microprocessor. A superscalar microprocessor is configured to execute up to a maximum number of instructions during each clock cycle. However, the maximum number of instructions may not be executed during a particular clock cycle due to a variety of reasons. For example, the number of instructions available for execution may be a limiting factor. Subsequent instructions may be in the process of being transferred from main memory. Alternatively, a branch instruction may be detected within the instructions fetched, causing some of the instructions which were fetched to be discarded. Maximizing the number of instructions executed during each clock cycle is often a high priority in designing a superscalar microprocessor.
Many techniques have been developed to increase the number of instructions executed per clock cycle. For example, result forwarding is often employed in which the result of executing an instruction is directly forwarded to instructions awaiting that result (i.e. the instructions awaiting the result are "dependent" upon the instruction producing the result). Result forwarding is used as opposed to storing the result in a specified destination and then transferring the stored result to a subsequent instruction. The clock cycle saved by directly forwarding the result reduces the time penalty for the dependency between the instructions. Unfortunately, the dependency between the two instructions prevents them from executing simultaneously.
Out of order execution is often employed by superscalar microprocessors to increase the number of instructions executed concurrently in the presence of instruction dependencies. If a microprocessor employs out of order execution, a second instruction subsequent to a first instruction within a particular instruction sequence may be executed prior to the first instruction. For example, if the first instruction may not be executed due to a dependency upon another instruction which has not yet executed, then the first instruction may not execute. However, if the second instruction is not dependent on any instructions which have not yet executed, the second instruction may execute. Unfortunately, a maximum number of instructions executed per cycle is often not achieved even with the implementation of out of order execution. For example, the number of instructions which may be fetched and dispatched to execution units per clock cycle may limit the number of instructions which may be executed out of order. Additionally, the number of dependencies within a given instruction sequence may limit the number of instructions which may execute out of order. It is noted that the sequence of instructions within a program defines the "program order" of the program. If a first instruction within the instruction sequence is prior to a second instruction, the first instruction is prior to the second instruction in program order. Similarly, if a first instruction within the instruction sequence is subsequent to a second instruction, the first instruction is subsequent to the second instruction in program order.
For microprocessors employing the x86 microprocessor architecture, the problem of instruction dependencies is magnified. The x86 microprocessor architecture specifies a relatively small number of registers (eight registers as opposed to as many as 32 registers in other microprocessor architectures). Additionally, several of the registers have defined interpretations which prevent their use for storing an arbitrary operand. For example, the ESP register defines the top of a stack data structure within main memory. Therefore, the ESP register may not be used to store an arbitrary operand without losing access to the stack data structure. Because the number of registers available for storing operands is small, many instructions utilize a value stored in memory as an operand. For example, the stack data structure may store many of the operands used by a particular instruction sequence. The x86 microprocessor architecture specifies a pair of registers, the ESP and EBP, which reference the stack data structure.
As will be appreciated by those skilled in the art, a stack is a data storage structure implementing a last-in, first-out storage mechanism. Data is "pushed" onto a stack (i.e. the data is stored into the stack data structure) and "popped" from the stack (i.e. the data is removed from the stack data structure). When the stack is popped, the data removed is the data that was most recently pushed. The ESP register defines the "top" of the stack, upon which the next value is pushed and from which the next value is popped. The EBP register specifies the base of the stack for a particular instruction sequence (e.g. a subroutine or a computer program). Operand values for instructions within the instruction sequence lie between the addresses in memory defined by the values in the ESP and EBP registers.
Instructions which use a value in the stack as an operand or use the stack as a destination may modify the ESP register. In particular, an instruction which pushes a value onto the stack or pops a value from the stack modifies the ESP register since the top of the stack is changed. Exemplary x86 instructions which push values onto the stack may include the PUSH instruction and the CALL instruction. Exemplary x86 instructions which pop values from the stack may include the POP and RET instructions. Because push and pop instructions modify the ESP register, instructions which use the ESP register to locate an operand within the stack as well as subsequent push and pop instructions are dependent upon a particular push or pop instruction for the value stored in the ESP register. A mechanism is desired for removing these dependencies such that multiple instructions which reference the ESP register may be executed concurrently.
In addition to the above mentioned techniques, superscalar microprocessors often employ speculative execution of instructions to further increase performance. As used herein, the term "speculative" refers to execution of an instruction prior to that instruction being required according to the sequential execution of instructions. Speculatively executed instruction results may be discarded if the instruction execution is not needed. For example, if a branch instruction is mispredicted, instructions subsequent to the branch were incorrectly executed. The results of these subsequent instructions are therefore discarded. It is desirable that the dependency removal mechanism be capable of recovering from branch mispredictions.