High performance processors currently used in data processing systems today may be capable of "superscalar" operation and may have "pipelined" elements. A superscalar processor has multiple elements which operate in parallel to process multiple instructions in a single processing cycle. Pipelining involves processing instructions in stages, so that the pipelined stages may process a number of instructions concurrently.
In a typical first stage, referred to as an "instruction fetch" stage, an instruction is fetched from memory. Then, in a "decode" stage, the instruction is decoded into different control bits, which in general designate i) a type of functional unit for performing the operation specified by the instruction, ii) source operands for the operation and iii) destinations for results of operations. Next, in a "dispatch" stage, the decoded instruction is dispatched per the control bits to a unit having an "execution" stage. This stage processes the operation as specified by the instruction. Executing an operation specified by an instruction includes accepting one or more operands and producing one or more results.
A "completion" stage deals with program order issues that arise from concurrent execution, wherein multiple, concurrently executed instructions may deposit results in a single register. It also handles issues arising from instructions subsequent to an interrupted instruction depositing results in their destination registers. In the completion stage an instruction waits for the point at which there is no longer a possibility of an interrupt so that depositing its results will not violate the program order, at which point the instruction is considered "complete", as the term is used herein. Associated with a completion stage, there may be buffers to hold execution results before results are deposited into the destination register, and/or buffers to backup content of registers at specified checkpoints in case an interrupt needs to revert the register content to its pre-checkpoint value. Either or both types of buffers can be employed in a particular implementation. At completion, the results of execution in the holding buffer will be deposited into the destination register and the backup buffer will be released.
While instructions for the above described processor may originally be prepared for processing in some programmed, logical sequence, it should be understood that they may be processed, in some respects, in a different sequence. However, since instructions are not totally independent of one another, complications arise. That is, the processing of one instruction may depend on a result from another instruction. For example, the processing of an instruction which follows a branch instruction will depend on the branch path chosen by the branch instruction. In another example, the processing of an instruction which reads the contents of some memory element in the processing system may depend on the result of some preceding instruction which writes to that memory element.
As these examples suggest, if one instruction is dependent on a first instruction and the instructions are to be processed concurrently or the dependent instruction is to be processed before the first instruction, an assumption must be made regarding the result produced by the first instruction. The "state" of the processor, as defined at least in part by the content of registers the processor uses for execution of instructions, may change from cycle to cycle. If an assumption used for processing an instruction proves to be incorrect then, of course, the result produced by the processing of the instruction will almost certainly be incorrect, and the processor state must recover to a state with known correct results up to the instruction for which the assumption is made. (Herein, an instruction for which an assumption has been made is referred to as an "interruptible instruction", and the determination that an assumption is incorrect, triggering the need for the processor state to recover to a prior state, is referred to as an "interruption". The point in the instruction stream at which the interruptible instruction occurs is referred as the "interrupt point".) In addition to incorrect assumptions, there are other causes of such interruptions requiring recovery of the processor state. Such an interruption is generally caused by an unusual condition arising in connection with instruction execution, error, or signal external to the processor.
The use of a history buffer ("HB") is known for saving a processor state before an interruptible instruction, so that if an interrupt occurs, HB control logic may recover the processor state to the interrupt point by restoring the content of registers.
History buffer schemes suffer from perceived difficulties in providing efficient mechanisms to back out the speculative updates which are required for exception recovery. As a result, the dominant mechanisms employed in current processors involve various rename register schemes. However, register rename techniques also provide considerable challenges for high-end processor designers.
For example, with renaming, when an instruction is dispatched the processor must perform a lookup in the rename register table to determine which rename register holds the current version of the specified architected register. This two level register access (one into the rename table and one into the physical register file using the rename index) often is a cycle time limiting path. Moreover, the number of instructions which may be issued out-of-order depends on the number of rename registers available. When no rename registers are available, dispatch must be halted until rename registers again become available through the completion of instructions currently in the pipe.
Furthermore, most existing rename register based schemes incorporate a completion table to allow in-order completion of instructions. Instruction completion includes updating the architected register set with the "future file" copy of the register maintained in the rename register. The size of the completion table often forms a hard limit on the number of instructions which can be live, (e.g. dispatched but not yet completed.) Furthermore, the lifetime of basically consists of the interval from dispatch to in-order completion. Therefore, the number of rename registers often forms another hard limit for the number of live instructions for a given block of code.
Additionally, while rename registers are useful for maintaining future state results for speculatively executed instructions, additional mechanisms are often required to allow detection of exceptions and recovery from exceptions. For example, to allow recovery of speculative instructions beyond a predicted conditional branch, one solution is to tag instructions with a 2-bit tag identifying the basic block which contains the given instruction. When a branch is found to be mispredicted, its tag is broadcast, and instructions with tags for subsequent blocks are purged from the machine.
Tagging basic blocks (blocks of code delineated by branches) with unique tags allows flushing and refetching the instruction stream only at these branch points. Page faults which occur for loads and stores are often considered to occur much less frequently, so "cheaper", less responsive solutions are often employed. One common solution in schemes which incorporate a completion table is simply wait until the offending instruction is the next step to be the completed instruction, flush all instructions from the machine, and take the interrupt at location of the faulting instruction.
In systems which restrict the degree of out-of-order execution, especially for loads and stores, other mechanisms are used to maintain storage consistency. For example, in many systems, loads and stores are executed strictly on an in-order basis. As a result, if a load requires data from a location which is stored into by a previous store, the store will have already executed and a simple tracking mechanism can indicate whether it is safe for the load to proceed or if it should be held in execute waiting for the store to write the data into the cache. This simple mechanism is not easily adapted to handle more aggressive designs which allow out-of-order execution of loads and stores.
Speculative execution can also take the form of executing instructions which read the summary overflow flag out of order with respect to overflow (and hence summary overflow) setting instructions. In most cases, instructions which are capable of setting the overflow flag rarely do set the overflow flag. As a result, designs, such as PowerPC 604, choose not to incorporate special checking hardware to handle the rare cases, it simply executes the instruction in a serial fashion. When an overflow setting (e.g., OE=1) instruction is encountered at dispatch, dispatch is halted until all previous instructions complete. This ensures that all prior instructions get the "old" overflow flag value, the value prior to any potential update. Then the overflow setting instruction executes to completion. Then subsequent instructions are allowed to dispatch. Holding dispatch until the overflow setting instruction completes guarantees the subsequent instructions get the "new" value of the overflow and summary overflow flags. This simple mechanism is often selected partially because the designers choose not to support register rename techniques for flags such as overflow. While this is a simple mechanism to handle overflow setting instructions, the serialization effects on performance is fairly severe in codes which have even a moderate amount of such instructions. Since the overflow rarely occurs, one might get a performance advantage from "guessing" that the outcome will be that the overflow will not be set and speculatively execute subsequent instructions.
In addition to the GPRs and the rename registers, most rename register schemes require a mapping table to track which rename register holds the most "recent" copy of a GPR, the mapping table is used by dispatch to determine the source location for an instruction's source registers.
As shown above, rename techniques by themselves do not provide a global solution to providing recovery for each of the various forms of speculative execution, therefore, several different mechanisms are often incorporated to handle mispredicted branches, page fault exceptions, load-hit-store collisions, overflow conditions, etc.
Accordingly, it is an object of the present invention to provide a method for handling interrupt and branch recovery which is independent of the type of interrupt that has occurred. Further objects and advantages of the present invention will become apparent in view of the following disclosure.