A principal factor in determining the throughput of a central processing unit is the effectiveness of its pipe-lined instruction unit in processing conditional instructions. Pipelined instruction units treat instructions as regular segmented entitie's to allow the staggered, overlapping processing of multiple instructions. In this manner, burst instruction processing speeds of up to one instruction per processor cycle can be achieved.
Conditional instructions that rely on the execution results of closely prior instructions present a special problem for pipe-lined instruction units. Conditional branch (BC) instructions, exemplary of conditional instructions in general, may implement an instruction sequence branch to a non-sequential next instruction address depending on the particular condition code returned by the execution of a prior condition code setting (CCS) instruction. Where a conditional branch instruction immediately follows and depends on the condition code returned by a condition code setting instruction, the conditional branch instruction is typically held, or interlocked, in the instruction unit pipeline until the condition code returned by the condition code setting instruction has become available. Only then can the conditional branch instruction correctly proceed. The interlock of the conditional branch instruction directly reduces the instruction processing throughput of the instruction unit and, therefore, the central processing unit as a whole.
In an attempt to avoid or reduce the throughput burden created by condition code interlocks, a number of techniques have been proposed and utilized with varying degrees of success. One technique proposed is to optimize the instruction unit specifically with regard to the execution of condition code setting instructions. The optimization reduces the instruction length of condition code setting instructions and thereby reduces the amount of time that a closely following conditional instruction is interlocked. To be effective, the shortened instructions require that the segmented pipeline of the instruction unit be similarly shortened. This, however, results in the instruction unit being poorly optimized for many of the more complex instructions with a corresponding degradation in the peak throughput of the central processing unit for a normal mix of processor instructions.
Another technique for optimizing the processing of condition code setting instructions is to implement a mechanism for predicting the condition code result early in the processing of the condition code setting instruction. An immediately following conditional branch instruction need only be interlocked, if at all, until the predicted condition code becomes available.
Unfortunately, a problem inherent with predictive condition code mechanisms is that the predicted condition code result will not always match that returned upon fully processing the condition code setting instruction. Where the predicted condition code is incorrect, the currently executing instructions must be aborted. Typically where the aborted instructions include a conditional branch instruction, the instruction unit pipeline must be emptied and restarted of the appropriate branch determined sequence of instructions.
The working assumption for predictive condition code mechanisms is that the processing throughput improvement obtained by correct condition code predicting outweighs the throughput penalty associated with incorrect condition code predictions. However, where the central processing unit is multiply executing substantially different programs, a statistics-based condition code predicting mechanism may be seriously in error in predicting condition codes for each program individually. The effective throughput for an ill-predicted program will be substantially poorer than the throughput level obtainable by simply accepting the full conditional instruction interlock penalty.
A better technique for improving the instruction unit processing of condition code setting instructions is to actually determine the eventual condition code result very early in the processing of the condition code setting instruction. Again, a closely following conditional instruction need be interlocked, if at all, only until the early determined condition code becomes available. An advantage is that the early condition code determining mechanism is fully deterministic and, therefore, completely avoids the throughput degradation potential associated with predicting condition codes.
Implementing a fully deterministic early condition code mechanism is, however, not without its disadvantages. The primary disadvantage is that it typically requires substantial instruction unit hardware support. There are a variety of distinctly different condition code setting instructions. Therefore, the hardware must support each separate type or instruction. Further, the early determined condition code must be determined very quickly to be at all useful and to fit within the processor cycle constraints of the instruction unit pipeline. The early condition code determining mechanism must receive the operand data associated with the condition code setting instruction to obtain a deterministic result. The transfer of anywhere from 16 to 64 bytes of operand data followed by operand analysis to determine the condition code, places severe, if not preclusive timing limitations on the instruction unit, substantially increases the amount of instruction unit hardware needed, and greatly reduces the number, variety and complexity of condition code setting instructions that can be handled by the early condition code determining mechanism. Additionally, conditional code setting instructions that are not handled by the early condition code determining mechanism are often simply and typically interlocked until the normal condition code is returned and set in the last cycle of the condition code setting instruction. Consequently, the net gain in processor throughput can be quite limited.
Therefore, there is a need for a central processing unit architecture that implements a deterministic, early condition code analysis mechanism to handle a wide variety of condition code setting instructions without resort to extensive hardware support and, further, that improves the processing throughput of non-early condition code setting instructions.