This invention relates generally to computer systems and more particularly to apparatus to predict a next set of computer instruction in pipelined processors.
As it is known in the art, computer systems have become ubiquitous. In particular, one type of computer system widely employed is the so-called pipelined processor. In a pipelined processor instructions are decomposed into assembly like stages. Illustratively, a pipeline stage may include an instruction fetch stage in which instructions are fetched in one or several cycles from a cache memory, and an instruction decode stage in which an instructions op code, that is the portion of the code which determines the function of the instruction is examined to ascertain the function of the instruction and thus the resources needed by the instruction. Illustrative resources may include general purpose registers within the CPU access to internal buses as well as external buses and functional units such as I/O units and arithmetic logic units and so forth. A third stage is typically the instruction issue stage in which resource availability is checked for each instruction and resources are reserved for particular instructions. The fourth stage of a typical parallel pipeline is the execution stage in which instructions are executed in one or several execution stages writing results into the general purpose registers during the last execution stage.
In an ideal pipelined processor, time is measured in CPU clock periods. In theory, the clock period for a P-stage pipeline would be 1/P the clock period for a non-pipelined equivalent since the non-pipeline equivalent would have P-1 less stages of execution for the instruction. Thus, with the pipelined approach there is the potential for a P times improvement in throughput or performance over a conventional non-pipelined architecture.
There are several practical limitations on pipeline performance however which prevents a pipelined processor from achieving the P times throughput improvement. One particular limitation on practical performance is instruction dependencies. Instruction dependencies may be viewed as instructions which depend on the results of previous instructions and may therefore have to wait for the previous instructions to complete before they can proceed through the pipeline. Two types of instruction dependencies may be identified. The first one is the so-called data dependency which occurs when instructions use the same input and or output operands as for example when an instruction uses the result of a preceeding instruction as an input operand. A data dependency may cause an instruction to wait in the pipeline for a preceeding instruction to complete.
A control dependency on the other hand, occurs when a control decision such as for example a conditional branch decision must be made before subsequent instructions can be executed.
When an instruction dependency occurs, all instructions following the instruction being executed are blocked from executing and typically the instruction in the pipeline is stalled in a pipeline stage and does not proceed further until the instruction ahead of it proceeds or at the issue stage until all resources and data dependencies for the particular instruction are satisfied. When a large number of instruction dependencies occur in a program executed by a pipeline processor, the performance improvement implicit in a pipeline processor are reduced accordingly.
One technique known in the art to overcome the occurrence of instruction dependencies is so-called instruction scheduling. An important characteristic of pipeline processors is that by using equivalent but reordered code sequences, the pipeline processor can provide an improved performance by eliminating many of the so-called instruction dependencies. This is often done in an instruction scheduler in which the instructions are reordered by some algorithm to eliminate register conflicts that appear to the hardware as dependencies and to reduce the occurrence of data dependencies by executing instructions in a rescheduled order.
When a processor is fed a branch type of instruction, the processor waits until a target address is calculated if the instruction is an unconditional branch such as a GO TO type of instruction, or if a conditional branch the processor waits until some branch prediction is resolved such as an IF type statement. To improve performance of instruction scheduling and in general pipelined processor performance, branch prediction techniques have been developed.
During the normal course of computing, it is relatively straight forward to ascertain which instructions are next to enter the instruction pipeline to continue processing and to prevent the instuction pipeline from becoming empty. During a normal flow of instructions, the instructions enter the pipeline sequentially. However, an exception to this normal sequencing is when the instruction pipeline encounters a branch type of instruction which instructs the processor to fetch an instruction which is outside of the normal sequence of fetching.
One type of branch is a so called conditional branch in which a processor will process one of two instructions with different addresses depending upon some condition being satisfied. Thus, the next instruction to enter the pipeline will not be known until the condition is checked. This can occur many instruction pipeline cycles after the instruction is executed causing a halt in fetching new instuctions.
In prior schemes, a branch prediction mechanism is provided early in the instruction pipeline to predict whether a branch will or will not be taken and thus provide to the instruction store a starting address of a next sequence of instructions. The execution stage includes circuitry to check on whether the prediction made was correct. If the prediction made was not correct the instruction pipeline is backed up to the branch instruction and the correct address of the next sequence of instructions is provided such that the instruction processor will choose the correct path.
One prediction technique is so called static branch prediction in which all encountered branches are always assumed to be either taken or not taken. Generally, "not taken" branches are usually assumed and the prediction is correct more often than not.
However, when there is a branch mispredict as determined by the execution unit, the instruction pipeline of the processor has to be flushed of the instructions currently under execution. Thus, execution of those instructions is terminated and the instruction pipeline must be loaded with a new set of instructions corresponding to the correct branch address. This wastes processor time particularly for very long pipelined processors.
For a large number of branch mispredicts which is implicit in a static approach the advantages obtained from a pipelined processor are reduced. This problem of branch mispredicts degrades performance correspondingly with the length of the pipeline. Thus, the advantages of a long pipelined processor are also reduced with a poor branch prediction scheme.
A further problem is that these schemes provide branch prediction in series with the instruction fetching and thus increases the length of the pipeline and increases the delay required to backup the pipeline. Moreover, the schemes employed must be relatively simple, as illustrated above, inorder to miminize the amount of decisions required to determine which starting address of the next instruction sequence to fetch in order that instructions can be issued from the instruction store as rapidly as possible so as not to slow down the execution of the instructions.
Moreover there is also a need to be able to predict sufficiently far in advance to insure that a continuous flow of instructions into the instruction pipeline is maintained. That is the prediction circuit needs to provide sufficient addresses to the instruction store to enable the store to issue instructions quickly to the remainder of the processor.