Exemplary embodiments relate to instruction fetch unit architecture, and more particularly to instruction fetch, branch prediction, and branch execution architecture in high-performance microprocessors.
Branch, prediction is an important aspect of efficient computer microarchitecture. It ensures that a steady supply of instructions is available for a microarchitecture with high instruction issue bandwidth in order to efficiently use the high issue bandwidth.
FIG. 1 illustrates one example of a microprocessor 100 in which instructions are fetched using instruction fetch logic (IF) from the instruction cache (IC). The instructions are processed by branch prediction logic (BP) and are passed to decoding logic D0, D1, D2, D3. Branch prediction, can effect a change in fetch direction by updating the instruction fetch address maintained in instruction fetch logic IF, if a predicted-taken branch is encountered.
Decoded instructions (both branch instructions and non-branch instructions) are transferred via transfer facility xfer to group dispatch logic (GD). Individual instructions being dispatched are renamed using register map table (MP) and entered into issue queues maintained in issue logic (ISS), depending on instruction type, to issue to the appropriate execution pipelines branch execution unit (BR) 105, load/store units (LD/ST) 110, fixed point execution (FX) 115, and floating-point execution, units (FP) 120. Instructions are issued out of order with respect to each other from issue queues maintained in issue logic (ISS).
Referring now to the execution in compute pipelines LD/ST 110, FX 115, and PR 120, instructions perform register file access to one or more register files (RF) and enter an execution phase. For LD/ST 110 instructions, the execution phase includes a first address generation phase (EA), followed by data cache access (DC) and data formatting (Fmt). For FX 115 instructions, execution includes a logic function implemented by (EX). For FP 120 instructions, execution includes one or more logic functions F1 to F6.
With respect to the execution of branch instructions in a RR 105 pipeline, branch instructions optionally perform one or more register file accesses to retrieve one or more of condition, branch, counter, and branch target operands in register file access logic (RF). Branch execution logic (EX) in the BR 105 pipeline computes the target address and branch condition, and compares them with the predicted target and condition. If a misprediction is detected, e.g., either the condition was computed incorrectly or the wrong target was supplied, a branch redirection action is taken. Mispredicted instructions are removed from execution pipelines 105, 110, 115, 120 using a flush or other appropriate mechanism, and the fetch address maintained in instruction fetch logic (IF) is updated.
All execution pipelines 105, 110, 115, 120 complete by performing a writeback (WB) of computed results and a transfer via xfer to a commit stage (CP).
Instructions are committed at their in-order commit point, by commit stage (CP) in-order relative to all other instructions. Interrupt, conditions, exceptions, and other special execution conditions cause a flush and refetch to be effected by the commit stage (CP), and the instruction fetch address in instruction fetch logic (IF) is set to one of a re-execution, an interrupt, or an exception handler address.
High branch prediction accuracy is of further importance to ensure that instructions having been fetched and executed predictively correspond to the actual program execution path, such that predictively executed instructions speed up execution of the overall program.
While great strides have been made in the prediction of conditions for conditional branches and for target addresses of function returns, accurately and efficiently predicting target addresses has been a problem for those skilled in the art.
FIG. 2 illustrates a first register indirect branch target prediction technique that uses a lookup table of values (recent target table RTT 220) Indexed by an instruction fetch address register (IFAR) 210 to retrieve the target address prediction 230 of the most recent target for the branch at the specified address.
Usually, arrays of an RTF 220 may support from 8 to 256 entries, which leads to destructive aliasing of entries sharing the same set of address bits that are used for indexing into the recent target table RTT 220 or otherwise used for mapping to the same entry in the RTT 220.
Consequently, in FIG. 2, the accuracy of this branch predictor is limited. While this branch predictor can maintain a single element of history for a limited set of indirect branches, it cannot detect patterns. Thus, while the RTF 220 is an efficient means for handling inter-module linkage calls, and other fixed target calls, this branch predictor is not adapted to detecting patterns.
FIG. 3 illustrates an indirect branch predictor (IBP) 300 that chooses targets 310 based on a global control flow history 320, much the same way a global branch predictor chooses the direction of conditional branches using global control flow history. As seen in FIG. 3, it is an adjunct to the normal target prediction device. Targets are always allocated in the instruction pointer tagged table 335 along with the type of branch. When a misprediction occurs due to a mispredicted target on an indirect branch, the indirect branch predictor 300 allocates a new entry in table 325 corresponding to the global history 320 leading to this instance of the indirect branch. This construction allows monotonic indirect branches to predict correctly from the IP-based target array 335 (accessed with instruction pointer 330) and allows data-dependent indirect branches to allocate as many targets as they may need for different global history patterns, which correlate with the different targets. Entries in the indirect branch predictor 300 are tagged with the hit and type information in the IP-based target array 335 (and accessed with instruction pointer 330) to prevent false positives from the indirect branch predictor 300 to lead to new sources of target mispredictions.
The indirect branch predictor architecture in FIG. 3 uses two branch prediction tables. A large table RTF 335 (e.g., accessed with instruction pointer 330) and a global history-based correlated smaller predictor 325 (e.g., using global history 320) allow detection of patterns based on directional history of global history 320. Since the global history 320 consists of directional history, the architecture depicted in FIG. 3 suffers because, in many applications, branch correlation on direction history is not high for register indirect branches, and branches with a common global history are forced to share entries in this architecture.
Thus, what is needed is a method for efficient low complexity implementation of a register indirect predictor based on correlation with the outcome of a sequence of register-indirect branches, allowing efficient prediction of speculatively executed indirect branches. What is also needed is a means to support a variety of workloads, containing branches correlated on target address history, those correlated on directional history, and those showing little or no correlation to history, but rather having static behavior depending on instruction address of the register indirect branch. What is further needed is a way to provide efficient and fast recovery in the event of a branch misprediction.