Program instructions for a microprocessor are typically stored in sequential, addressable locations within a memory. When these instructions are processed, the instructions may be fetched from consecutive memory locations and stored in a cache commonly referred to as an instruction cache. The instructions may later be retrieved from the instruction cache and executed. Each time an instruction is fetched from memory, a next instruction pointer within the microprocessor may be updated so that it contains the address of the next instruction in the sequence. The next instruction in the sequence may commonly be referred to as the next sequential instruction pointer. Sequential instruction fetching, updating of the next instruction pointer and execution of sequential instructions, may continue linearly until an instruction, commonly referred to as a branch instruction, is encountered and taken.
A branch instruction is an instruction which causes subsequent instructions to be fetched from one of at least two addresses: a sequential address identifying an instruction stream beginning with instructions which directly follow the branch instruction; or an address referred to as a “target address” which identifies an instruction stream beginning at an arbitrary location in memory. A branch instruction, referred to as an “unconditional branch instruction”, always branches to the target address, while a branch instruction, referred to as a “conditional branch instruction”, may select either the sequential or the target address based on the outcome of a prior instruction. It is noted that when the term “branch instruction” is used herein, the term “branch instruction” refers to a “conditional branch instruction”.
To efficiently execute instructions, microprocessors may implement a mechanism, commonly referred to as a branch prediction mechanism. A branch prediction mechanism determines a predicted direction (taken or not taken) for an encountered branch instruction, allowing subsequent instruction fetching to continue along the predicted instruction stream indicated by the branch prediction. For example, if the branch prediction mechanism predicts that the branch instruction will be taken, then the next instruction fetched is located at the target address. If the branch mechanism predicts that the branch instruction will not be taken, then the next instruction fetched is sequential to the branch instruction.
If the predicted instruction stream is correct, then the number of instructions executed per clock cycle is advantageously increased. However, if the predicted instruction stream is incorrect, i.e., one or more branch instructions are predicted incorrectly, then the instructions from the incorrectly predicted instruction stream are discarded from the instruction processing pipeline and the other instruction stream is fetched. Therefore, the number of instructions executed per clock cycle is decreased.
A processor may include a fetch unit configured to fetch a group of instructions, referred to as a “fetch group.” The fetch group may be fetched from an instruction cache and upon decoding may be enqueued in an instruction queue for execution. Currently, upon enquing a fetch group containing a branch instruction that is predicted taken in the instruction queue, there is a delay, e.g., two cycle lag, in enquing the subsequent instruction line (i.e., the branched instruction line) in the instruction queue to be executed. This delay results in dead-time in the pipeline where no instructions are executed as illustrated in FIG. 1.
Referring to FIG. 1, FIG. 1 is a timing diagram illustrating that the instructions at the branch target address (branched fetch group) are enqueued in the instruction queue two cycles after the enqueing of the fetch group containing a branch instruction. As illustrated in FIG. 1, a fetch group, a group of instructions, is fetched in two stages, which are indicated as IF1 and IF2. In the first stage, IF1 fetches fetch groups A, A+10, A+20, B, B+10, B+20, B+30, B+40, B+50, C, C+10 and C+20 in the indicated clock cycles. In the second stage, IF2 continues to fetch fetch groups A, A+10, B, B+10, B+20, B+30, B+40, C and C+10 in the indicated clock cycles.
At the decode stage, which is indicated as “DCD”, a branch instruction in the fetch group is determined as predicted taken or not taken. If the decode logic at the decode stage determines that the branch instruction in the fetch group is predicted taken, then the signal identified as “Br Predict Taken” goes high. Otherwise, the signal “Br Predict Taken” remains low. For example, referring to FIG. 1, the decode logic at the decode stage determined that the branch instruction in fetch groups A and B+30 were predicted taken.
In the stage following the decode stage, the instructions are enqueued in the instruction queue in the order to be executed. As illustrated in FIG. 1, fetch group A had a branch instruction that was predicted taken. Further, as illustrated in FIG. 1, the branch instruction branched to fetch group B. Hence, fetch group A was enqueued in the instruction queue followed by enqueing fetch group B. However, there was a two cycle lag between the enqueing of fetch group A and fetch group B. As stated above, this two cycle lag causes dead-time in the pipeline where no instructions are executed.
The two cycle lag as illustrated in FIG. 1 may be exacerbated as the frequency requirements of processors continue to grow. As the frequency requirements for processors continue to grow, i.e., increase in the number of cycles per second the processor operates, there is an increase in the number of clock cycles taken to fetch instructions into the processing pipeline. Hence, there may be an increase in the number of instructions between the top of the fetch pipeline (point at which the initial instruction was fetched) and the point at which the branch prediction can be accomplished. As a result, there may be cases where all the instructions may be dispatched while waiting for a predicted taken branch to be accessed, i.e., waiting to fetch the instructions at the branch target address, from the cache or other memory device. This may result in further dead-time in the pipeline than illustrated in FIG. 1.
By reducing dead-time in the pipeline, i.e., reducing the delay in enqueing instructions following the branch instruction predicted taken in the instruction queue, a greater number of instructions may be processed by a processor in a given period of time.
Therefore, there is a need in the art to reduce the fetch time of target instructions of a predicted taken branch instruction.