1. Technical Field
The present invention relates in general to data processing and, in particular, to branch prediction. Still more particularly, the present invention relates to a data processing system, processor and method of data processing having an improved branch target address cache (BTAC).
2. Description of the Related Art
A state-of-the-art microprocessor can comprise, for example, a cache for storing instructions and data, an instruction sequencing unit for fetching instructions from the cache, ordering the fetched instructions, and dispatching the fetched instructions for execution, one or more sequential instruction execution units for processing sequential instructions, and a branch processing unit (BPU) for processing branch instructions.
Branch instructions executed by the BPU can be classified as either conditional or unconditional branch instructions. Unconditional branch instructions are branch instructions that change the flow of program execution from a sequential execution path to a specified target execution path and which do not depend upon a condition supplied by the occurrence of an event. Thus, the branch specified by an unconditional branch instruction is always taken. In contrast, conditional branch instructions are branch instructions for which the indicated branch in program flow may be taken or not taken depending upon a condition within the processor, for example, the state of specified condition register bits or the value of a counter.
Conditional branch instructions can be further classified as either resolved or unresolved, based upon whether or not the condition upon which the branch depends is available when the conditional branch instruction is evaluated by the BPU. Because the condition upon which a resolved conditional branch instruction depends is known prior to execution, resolved conditional branch instructions can typically be executed and instructions within the target execution path fetched with little or no delay in the execution of sequential instructions. Unresolved conditional branches, on the other hand, can create significant performance penalties if fetching of sequential instructions is delayed until the condition upon which the branch depends becomes available and the branch is resolved.
Therefore, in order to minimize execution stalls, some processors speculatively predict the outcomes of unresolved branch instructions as taken or not taken. Utilizing the result of the prediction, the fetcher is then able to fetch instructions within the speculative execution path prior to the resolution of the branch, thereby avoiding a stall in the execution pipeline in cases in which the branch is subsequently resolved as correctly predicted. Conventionally, prediction of unresolved conditional branch instructions has been accomplished utilizing static branch prediction, which predicts resolutions of branch instructions based upon criteria determined prior to program execution, or utilizing dynamic branch prediction, which predicts resolutions of branch instructions by reference to branch history accumulated on a per-address basis within a branch history table (BHT) and/or branch target address cache (BTAC).
Modern microprocessors require several cycles to fetch instructions from the instruction cache, scan the fetched instructions for branches, and predict the outcome of unresolved conditional branch instructions. If any branch is predicted as taken, instruction fetch is redirected to the new, predicted address. This process of changing which instructions are being fetched is called a “taken branch redirect”. During the several cycles required for the instruction fetch, branch scan, and taken branch redirect, instructions continue to be fetched along the not taken path; in the case of a predicted-taken branch, the instructions within the predicted-taken path are discarded, resulting in decreased performance and wasted power dissipation.
Several existing approaches are utilized to reduce or to eliminate the branch redirect penalty. One commonly used method to reduce branch redirect penalty is to fetch instructions ahead and place them into an instruction buffer; however, if the buffer is empty, for example, due to a branch misprediction, an instruction cache miss, or too many taken branches in quick succession, then part or all of the instruction pipeline may go idle, decreasing performance.
A less common method to reduce the performance loss due to taken branches is the implementation of a BTAC that caches the branch target addresses of taken branches in association with the branch instruction's fetch address. In operation, the BTAC is accessed in parallel with the instruction fetch and is searched for an entry whose instruction fetch address matches the fetch address transmitted to the instruction cache. If such a BTAC entry exists, instruction fetch is redirected to the branch target address provided in the matching BTAC entry. Because the BTAC access typically takes fewer cycles than the instruction fetch, branch scan, and taken branch redirect sequence, a correct BTAC prediction can improve performance by causing instruction fetch to begin at a new address sooner than if there were no BTAC present.