1. Technical Field:
The present invention relates in general to data processing and, in particular, to processing branch instructions in a processor.
2. Description of the Related Art:
A superscalar processor for a computer system can comprise, for example, an instruction cache for storing instructions, one or more execution units for executing sequential instructions, a branch execution unit for executing branch instructions, instruction sequencing logic for fetching instructions from the instruction cache and routing fetched instructions to the various execution units for execution, and register files for storing operands and result data.
Branch instructions are utilized in a program to control the flow of instruction execution. Depending upon the type of a branch instruction and conditions present in the processor when the branch instruction is executed, branch execution may redirect execution from a sequential execution path (i.e., execution of instructions according to address order) to a non-sequential branch target path.
Branch instructions can generally be classified as either conditional or unconditional. Unconditional branch instructions change the flow of instruction execution from a sequential execution path to a specified branch target path and do not depend upon any condition. Thus, the branch in program flow specified by an unconditional branch instruction is always taken. In contrast, a conditional branch instruction indicates a branch in program flow that may or may not be taken depending upon a condition within the processor, for example, the state of a specified condition register bit or the value of a counter.
Conditional branch instructions can be further classified as either resolved or unresolved, based upon whether or not the condition upon which the branch depends is available when the conditional branch instruction is evaluated by the branch execution unit. Because the condition upon which a resolved conditional branch instruction depends is known prior to execution, resolved conditional branch instructions can typically be executed and instructions within the target execution path fetched with little or no delay in the execution of sequential instructions. Unresolved conditional branches, on the other hand, can create significant performance penalties if fetching of sequential instructions is delayed until the condition upon which the branch depends becomes available and the branch is resolved.
Therefore, in order to minimize execution stalls, some processors predict whether or not the branch specified by a conditional branch instruction will be taken. Utilizing the result of the prediction, the instruction sequencing logic is then able to speculatively fetch instructions within the branch target path prior to the resolution of the branch, thereby avoiding a stall in sequential execution units in cases in which the branch is subsequently resolved as correctly predicted. Conventionally, prediction of unresolved conditional branch instructions has been accomplished utilizing static branch prediction, which predicts resolutions of branch instructions based upon criteria determined by a compiler prior to program execution, or dynamic branch prediction, which predicts resolutions of branch instructions by reference to branch history accumulated on a per-address basis within a branch history table. More recently, even more elaborate two-level branch prediction methodologies have been proposed that utilize a first level of branch history that specifies the resolutions of the last K branch instructions to index into a second level of branch prediction storage that associates a resolution prediction with each (or selected ones) of the 2K-1 possible branch history patterns.
In order to further accelerate instruction fetching, some processors predict the branch target path address as well as the branch direction. The address of the branch target path can be predicted with a branch target address cache (BTAC), which temporarily stores, in association with each of a plurality of branch instructions, the branch target address to which control was transferred when each branch was last taken. In lieu of a BTAC, some processors alternatively employ a branch target instruction cache (BTIC), which caches a few instructions in the predicted branch target path so the instruction pipeline can be primed without accessing the instruction cache.
The present invention recognizes that although the use of conventional branch prediction and path prediction circuitry (e.g., a BHT and BTAC) in a processor generally improves processor performance given typical branch prediction accuracies of over 95%, conventional path prediction circuitry has difficulty in processing tight instruction loops. In particular, in aggressive processor architectures, a traditional BTAC cannot meet the processor""s instruction fetch cycle time if the branch target address falls within the same cache line as the associated branch instruction.
In view of the foregoing, the present invention provides a processor that efficiently obtains target path instructions even in the presence of tight program loops. The processor includes at least one execution unit for executing instructions and instruction sequencing logic that supplies instructions to the at least one execution unit for execution. The instruction sequencing logic includes an instruction fetch buffer and a branch prediction unit including a branch target cache.
When a conditional branch instruction is detected, the branch instruction is predicted as taken or not taken. In response to the prediction of the branch instruction as taken, the branch target cache causes multiple copies of a target instruction group to be loaded into the instruction fetch buffer under the assumption that the branch instruction is a member of the target instruction group. Thereafter, the branch target cache causes all but one of the multiple copies to be canceled from the instruction fetch buffer prior to dispatch if the branch instruction does not belong to the target instruction group. Thus, the branch target cache can meet the instruction fetch cycle time of the processor even for the worst case condition in which the branch instruction is within the target instruction group.
All objects, features, and advantages of the present invention will become apparent in the following detailed written description.