Historically, implementations of data processors were restricted to "in-order" instruction execution. Generally, the restrictions imposed by an "in-order" execution scheme limit the performance attainable using the data processor. In contrast, allowing the use of "out-of-order" instruction execution may increase data processor performance by a factor of three (3) or more. Attaining this performance increase is hampered, however, by the requirement of maintaining precise exceptions. Exception conditions are detectable during the execution of an instruction, such as an attempt to divide by zero. In a processor implementing precise exceptions, the effects of the excepting instruction are undone and a trap is taken, such that it appears to the program that the instruction never began. Typically, known data processors which maintain precise exceptions are either expensive or restrictive with respect to the number of instructions executable per clock cycle.
In an "out-of-order" data processor, only the instruction execution occurs out of program order (sequence). Thus, instructions are still "issued" in program order, and "retirement" (the act of completing an instruction and allowing its side effects to become visible) also occurs in program order. A data dependency is said to exist between two instructions when the first instruction produces a result consumed by the second instruction. Fundamentally, only the data dependencies between instructions limit the order of instruction execution. Naturally, there are several factors which may limit the performance of an "out-of-order" data processor. The primary factor is the rate at which instructions are issued by the processor. A "conservation of instructions" property states that the average rate of instructions executed per clock is limited by the average rate of instruction issued per clock.
A secondary factor which may limit the performance of an "out-of-order" data processor is the rate of "retirement" of instructions. Initially, the processor is not executing any instructions. The processor begins fetching and issuing instructions. As previously indicated, issuance and "retirement" must occur in program order. Accordingly, until the execution of the first instruction issued by the processor is completed and retired, no subsequently issued instruction may be retired. While the processor waits to retire the first instruction, it continues to issue instructions. Thus, the processor may complete the execution of a number of subsequently issued instructions; however, the "retirement" of these subsequently issued instructions is deferred until the first instruction is completed and retired. Assuming the processor issues instructions at a constant rate, the net number of instructions in the processor is a monotonically increasing function.
A final factor which may limit the performance of an "out-of-order" processor is the branch prediction recovery time. When an instruction issuer encounters a conditional branch instruction, it has at least two possible courses of action. First, the instruction issuer can stall the issuance of the instruction. This action is undesirable in light of the primary factor affecting the processor's performance (rate of instruction issuance). Second, the instruction issuer can predict the direction of the branch and continue issuance of instructions down the predicted path. If the predicted path is incorrect, however, the register(s) affected by the incorrectly issued instructions must be restored to their original value. Thus, using the second approach maintains a high effective issue rate only if (1) the prediction is right most of the time, and (2) when the prediction is wrong, it doesn't take too long to start issuing the correct instructions. There are a significant number of algorithms capable of correctly predicting branches most of the time; however, once an algorithm is selected, the system designer has little control over what actually happens in the system at run time. When the prediction is incorrect, it is necessary to minimize the branch repair time, since as long as the branch repair is occurring, the processor cannot issue instructions. Consequently, the frequency of the stalls attributable to branch repair will adversely affect overall machine performance.
Implementing a register file in an out-of-order machine presents additional problems. The retirement restriction dictates that no side effects appear out of program order. Known processors employ reorder buffers to overcome the problem of having register side effects occur in program order. Initially, an issuing instruction reads its operand from one of a fixed number of addressable registers (e.g. general purpose registers) in a register file. Next, the instruction is issued and the associated operations are performed. If the instruction has a register destination, the specified register in the register file is not modified. Instead, a slot in the reorder buffer is allocated for the result. When the instruction completes execution, the reorder buffer is modified instead of the specified register in the register file. As new instructions issue, they read modified registers from the reorder buffer; however, unmodified registers are still read from the register file. Thus, the process of retirement entails taking the modified registers from the reorder buffer and writing them back to the register file. The rate of retirement is limited by the rate at which the processor can read the reorder buffer (the number of read ports on the reorder buffer), and write the register file. Thus, by using a reorder buffer, the system designer may achieve zero time branch repair (by throwing away the section of the reorder buffer that is no longer valid); however, the retirement rate is limited.
One solution employed to cure the ills of the reorder buffer's limited retirement rate is the implementation of a "history buffer" (history buffer). As instructions are issued, the registers requiring modification are copied into the history buffer before they are modified. Accordingly, the values in the history buffer represent "old" register values. As instructions complete execution, their results are stored directly into the register file. Thus, the process of retirement entails deciding when an instruction is complete. Typically, the processor "decides" the results of any number of instructions during a clock period, therefore, the rate of retirement is unlimited. Slots in the history buffer storing old values for "decoded" instructions are simply discarded. In contrast, when a branch repair is necessary, the rate at which the system restores the old register values will limit the branch repair time. Thus, using a history buffer, the system designer may achieve a retirement time of zero; however, the branch repair rate is limited.
Accordingly, both the reorder buffer and the history buffer perform well in accomplishing one objective, but fail to perform well in accomplishing the other. Essentially, the problem is that both approaches (reorder buffer and history buffer) require the physical movement of data. The reorder buffer moves data in the act of retirement. Whereas, the history buffer moves data in the act of branch repair. Thus, it is desirable to provide a mechanism which accomplishes the foregoing objectives without physically moving data.