High performance processors currently used in data processing systems today may be capable of "superscalar" operation and may have "pipelined" elements. A superscalar processor has multiple elements which operate in parallel to process multiple instructions in a single processing cycle. Pipelining involves processing instructions in stages, so that the pipelined stages may process a number of instructions concurrently.
In a typical first stage, referred to as an "instruction fetch" stage, an instruction is fetched from memory. Then, in a "decode" stage, the instruction is decoded into different control bits, which in general designate i) a type of functional unit for performing the operation specified by the instruction, ii) source operands for the operation and iii) destinations for results of operations. Next, in a "dispatch" stage, the decoded instruction is dispatched per the control bits to a unit having an "execution" stage. This stage processes the operation as specified by the instruction. Executing an operation specified by an instruction includes accepting one or more operands and producing one or more results.
A "completion" stage deals with program order issues that arise from concurrent execution, wherein multiple, concurrently executed instructions may deposit results in a single register. It also handles issues arising from instructions subsequent to an interrupted instruction depositing results in their destination registers. In the completion stage an instruction waits for the point at which there is no longer a possibility of an interrupt so that depositing its results will not violate the program order, at which point the instruction is considered "complete", as the term is used herein. Associated with a completion stage, there are buffers to hold execution results before results are deposited into the destination register, and buffers to backup content of registers at specified checkpoints in case an interrupt needs to revert the register content to its pre-checkpoint value. Either or both types of buffers can be employed in a particular implementation. At completion, the results of execution in the holding buffer will be deposited into the destination register and the backup buffer will be released.
While instructions for the above described processor may originally be prepared for processing in some programmed, logical sequence, it should be understood that they may be processed, in some respects, in a different sequence. However, since instructions are not totally independent of one another, complications arise. That is, the processing of one instruction may depend on a result from another instruction. For example, the processing of an instruction which follows a branch instruction will depend on the branch path chosen by the branch instruction. In another example, the processing of an instruction which reads the contents of some memory element in the processing system may depend on the result of some preceding instruction which writes to that memory element.
As these examples suggest, if one instruction is dependent on a first instruction and the instructions are to be processed concurrently or the dependent instruction is to be processed before the first instruction, an assumption must be made regarding the result produced by the first instruction. The "state" of the processor, as defined at least in part by the content of registers the processor uses for execution of instructions, may change from cycle to cycle. If an assumption used for processing an instruction proves to be incorrect then, of course, the result produced by the processing of the instruction will almost certainly be incorrect, and the processor state must recover to a state with known correct results up to the instruction for which the assumption is made. (Herein, an instruction for which an assumption has been made is referred to as an "interruptible instruction", and the determination that an assumption is incorrect, triggering the need for the processor state to recover to a prior state, is referred to as an "interruption" or an "interrupt point".) In addition to incorrect assumptions, there are other causes of such interruptions requiring recovery of the processor state. Such an interruption is generally caused by an unusual condition arising in connection with instruction execution, error, or signal external to the processor.
The use of a history buffer ("HB") is known for saving a processor state before an interruptible instruction, so that if an interrupt occurs, HB control logic may recover the processor state to the interrupt point by restoring the content of registers. This use of a history buffer has the known advantage of reducing the timing penalty in register lookup during instruction dispatch as compared to a register renaming scheme.
According to the terminology used herein, when an instruction performs an operation affecting the contents of a register, the operation is said to "target" that register, the instruction may be referred to as a "targeting instruction", and the register is referred to as a "target register" or a "targeted register". For example, the instruction "ld r3, . . . " targets register r3, and r3 is the target register for the instruction "ld r3, . . . ".
If multiple instructions with the same target register have been dispatched, the last one dispatched writes the architected register. Each such instruction is assigned a unique result tag associated with the target register at dispatch. When an instruction with target registers is dispatched, the result tag will be written into a tag field associated with the target register, and either the prior target register content or the prior result tag is retrieved from the register and stored in an history buffer entry (HBE) allocated for it. When it becomes known that the speculatively executed instruction will not be aborted, the entry is retired (deallocated). However, if the speculatively executed instruction needs to be aborted, register contents or result tags saved in HBE's are copied back to the register and the entries are retired.
FIG. 1 illustrates the above with an example showing a traditional history buffer 100 as applied to the processing of representative instructions 102 shown. The instructions 102 reside in a memory device (not shown) in a sequence of lines 101 which are depicted in FIG. 1 as line numbers X+0, X+1, etc. The instruction 102 at line X+0 is depicted as "branch!", signifying that the instruction is representative of a conditional branch type instruction, such as "branch target.sub.-- addr", for example. The instruction 102 at line X+1 is depicted as "add, r3 . . . ", signifying that the instruction is representative of an instruction such as "add r3, r6, r7" (i.e., r6+r7.fwdarw.r3), for example, which alters the content of register r3.
According to the prior art application of this history buffer 100, upon speculative prediction that the branch type instruction at line X+0 is not taken, instruction "add r3, . . . ", at line X+1, is dispatched and the value of target register r3 before the branch instruction at X+0 is saved in a history buffer entry ("HBE") 104. (Herein, a history buffer entry may be referred to by its entry number 103. That is, a first entry 104 in a history buffer is referred to as HBE0, a second entry as HBE1, etc.) Instructions "add r2, . . . ", "ld r3, . . . ", and "add r4, . . . " result in history buffer entries HBE1, HBE2, and HBE3 respectively. Notice that HBE2 has the contents of register r3 produced by instruction "add r3, . . . ", because "ld r3, . . . " is dispatched after "add 3, . . . ". There is no instruction dispatched with target r4 except "add r4 . . . "; therefore, HBE3 has the content of r4 produced before the branch.
If the prediction that the branch at line X+0 is not taken proves to be correct, and the instruction "ld r3, . . . " at line X+1 in this context causes no exception, then the HB 100 entries HBE0, HBE1, etc. are deallocated in the order of completion. But, if the instruction "ld r3, . . . " causes an exception, the recovery mechanism will restore register content for r3 and r4 from HBE2 and HBE3, and deallocate those HB entries. The processor will thus be restored to the state immediately before the "ld r3, . . . " instruction was dispatched. The state at that point includes register r3 with contents produced by "add r3, . . . ", and the content of r4 before the branch (which is the same as its content before the "ld r3, . . . " instruction).
If the prediction that the branch is not taken proves to be incorrect, then results must be abandoned for the results that were produced by speculatively executing instructions after the branch instruction. The registers written by these instructions need to be restored to their contents prior to the branch instruction. For example, if the branch is resolved after writing into HBE 3, the recovery mechanism must copy register content in HBE0, HBE1 and HBE3 back to registers r3, r2 and r4 in order to recover the processor state that existed before the branch. Also, in connection with completing the recovery, all four HBE's are deallocated.
In many cases, such as in the above example, it is problematic to implement this mechanism because the HB 100 contains multiple values of a given register. For example, as shown in FIG. 1 the HB 100 has values of r3 in HBE0 and HBE2. The HB 100 contains both these values because in different contexts either value of r3 may need to be recovered. Therefore, the need exists to select between multiple values of a register in recovering the processor state. One possible solution is to exhaustively reverse the order of speculative execution back to the interrupted instruction. This way, if recovery is required all the way back to line X+0, for example, the r3 content from HBE 0 will overwrite the content from HBE2, and the processor will have recovered back to the known state before the branch at x+0.
It is a disadvantage of this mechanism that the processor is stalled for a number of cycles while this iterative process recovers the processor state. Because branch misprediction may occur frequently, the multi-cycle stall penalty is not acceptable in a high performance processor, such as a superscalar processor. Consequently, the history buffer approach is regarded by some as poorly suited for superscalar implementations. See, for example, Mike Johnson, Superscalar Microprocessor Design, 92 (1991) (discussing disadvantages of using a history buffer, and the relative advantage of using a reorder buffer and a future file). If, in spite of this teaching to the contrary, a history buffer is used for recovering a processor state, a need exists for improving the efficiency of recovering the processor state from information stored in the history buffer, including improving the history buffer multi-cycle stall penalty.