The present invention relates generally to digital processor architecture, and more particularly to a mechanism for reducing branch penalties in a pipelined digital processor.
Performance enhancements of digital processors include a variety of techniques and designs. Use of small, fast memories (i.e., cache) for alleviating bottle necks occurring from main memory accesses is an example of one such technique. Another is to design an instruction execution unit using the concepts of parallel processing to overlap instruction execution tasks. The design of such instruction execution units, typically referred to as “pipeline” designs, usually includes, at a minimum, fetch, decode and execution stages. Pipeline designs allow instructions to be executed in the sequence in which they appear in memory, but allow a new instruction to begin the fetch, decode, and execute phases before an earlier instruction has completed those phases. This form of parallelism can substantially improve performance, but not without certain difficulties.
Optimum performance of digital processors with pipeline designs require that the pipeline be kept full. If an input stage is idle on a particular cycle, this idleness will eventually promulgate through the entire pipeline to detract from pipeline efficiency. One such obstacle to optimal performance of a pipelined digital processor arises when an instruction has a dependence upon data or an operand of a prior instruction occurring close enough together in the instruction sequence to raise the possibility of an inconsistent result. One approach to overcoming this obstacle referred to as “data bypass,” “data forwarding,” or “operand forwarding, which involves passing data to its eventual user before it would be available through normal data paths of the processor.
Another obstacle to optimal performance has been conditional branches, which have long been a bane of pipeline design because they can temporarily halt a pipeline until the branch target can be determined. Popular approaches to alleviating this problem include delayed branching and branch prediction. Briefly, the concept behind delayed branching is to introduce the branch instruction into the pipeline followed by the next inline instruction that is to be executed for the instruction stream before the branch is taken. This allows the branch instruction to begin set up procedures to determine the branch while the final instruction of the block is executed.
Branch prediction is an estimate of which branch path will be taken. Based upon the estimate, a branch to the estimated instruction stream is made. If the estimate is not correct, the estimated instruction stream must be removed in favor of the correct instruction stream. When such estimates are mostly correct, branch prediction can be very effective.
Generally, branch instructions are of two different types: simple or complex. A simple branch instruction is typically one in which the condition upon which the branch is determined is known just before or during decode. An unconditional branch is of this type, as are branch instructions that are preceded by some form of a compare instruction. Complex branch instructions are ones in which the condition upon which the branch will be taken is not known until the branch instruction is actually executed. The penalties imposed by complex branch instructions can be exacerbated when the depth of the pipeline is increased in order to operate digital processors at higher clock speeds, although the penalty can be alleviated to some extent by branch prediction.
Branch prediction can be complex, and expensive to design and test. The performance benefits achieved by branch prediction is dependent upon the effectiveness of the prediction algorithm(s) used, many of which employ relatively large amounts of storage and complex hardware that can be quite expensive.
Accordingly, a technique to reduce branch penalty without complex branch prediction would be of significant advantage to the design and operation of high speed digital processors using pipeline design technique.