In superscalar microprocessor architectures, branch instructions are executed by a separate branch processing unit in order to predict whether a particular branch instruction will be taken or not. Typically, such predictions are based on past performance pertaining to that particular branch instruction. This is implemented using a branch history table, which records such past performances.
Branch prediction improves the efficiency and speed by which instructions are executed within a microprocessor by fetching ahead of time the instructions predicted to be executed. Without branch prediction, the microprocessor would have to wait until the branch instruction was actually executed and the resultant target address calculated within one of the execution units before being able to fetch the instructions lying along the taken branch. The processor would then stall waiting for those instructions to be fetched from memory.
One type of branch prediction circuitry utilized within the branch processing unit is a branch target address cache ("BTAC"), which is a history table that associates a branch's target address with the branch instruction's address. Without a BTAC, each taken branch must be processed after it is fetched before the subsequent instructions are fetched. This normally causes at least a one cycle delay in the instruction fetch pipeline. In a standard BTAC, the BTAC is read in parallel with the instruction cache, and if there is data in the BTAC, then the address in the BTAC is used to feed the instruction cache and BTAC in the next cycle. If there is no data found in the BTAC (BTAC miss), then the sequential address is selected. This standard BTAC approach causes a limit to cycle time because it must be determined if there is a hit in the BTAC, and that hit indication must then be used to select the instruction cache address for the next cycle.
If the BTAC is made to be direct mapped (or set associative with a way prediction) and the address from the BTAC is always used, then much of the cycle time problems of a BTAC are solved. Such a solution requires that "not taken" branches be added to the BTAC, and has the following problems:
(1) "Not taken" branches must be included in the BTAC, which reduces the effective size of the BTAC; PA1 (2) A group of instructions that do not have any "taken" branches will incur a fetch penalty the first time it is fetched; and PA1 (3) The fact that the data out of the BTAC must feed back to the BTAC and the instruction cache causes some cycle time problems in very high frequency designs.
As a result of the foregoing, there is a need in the art for an improved BTAC design.