1. Technical Field
The present invention relates in general to a method and system for data processing and, in particular, to a processor and method for processing branch instructions. Still more particularly, the present invention relates to a processor and method that accelerate the resolution of conditional branch instructions.
2. Description of the Related Art
Branch instructions within a processor""s instruction set architecture (ISA) can generally be classified as either conditional or unconditional branch instructions. When executed, unconditional branch instructions always change the flow of program execution from a sequential execution path to a specified target execution path. Thus, the execution of unconditional branch instructions does not depend upon a condition supplied by the occurrence of an event. In contrast, conditional branch instructions may or may not change program flow depending upon a specified condition within the processor, for example, the state of specified condition register bits (e.g., greater than (GT), less than (LT), and equal to (EQ)) or the value of a counter.
Conventional processors generally adopt one of two strategies for processing conditional branch instructions. First, a conventional processor can operate without branch prediction, meaning that the processor does not execute instructions following a conditional branch instruction until the condition upon which the branch depends is known. In cases in which the condition is not available (i.e., the branch is not resolved) prior to processing of the conditional branch instruction, execution of instructions following the conditional branch instruction is stalled until the condition becomes available. Because this delay can be viewed as undesirable from a performance standpoint, other processors predict the outcome of conditional branch instructions. Techniques utilized to predict conditional branch instructions vary in complexity from relatively simple static (i.e., compiler-driven) prediction mechanisms to complex dynamic prediction mechanisms that utilize a branch history table (BHT) containing one or more levels of branch history to provide branch direction predictions (e.g., taken or not taken) and a branch target address cache (BTAC) to provide the target address of the predicted path. The predicted path is then executed speculatively until the condition upon which the branch depends is known, and the branch prediction can be resolved as correctly predicted or mispredicted. In the event that the conditional branch instruction was correctly predicted, the speculation results in a performance benefit. However, in the event of a misprediction, the speculatively executed instructions and the associated results must be flushed and instructions within the correct path must be fetched and executed. The cycles consumed by redirecting execution from the mispredicted path to the correct path is known as a branch misprediction penalty.
Regardless of which of the these (or other) approaches to processing conditional branch instructions is adopted, it is advantageous to resolve conditional branch instructions as quickly as possible to minimize execution stalls in processors that do not utilize branch prediction or to minimize branch misprediction penalties in processors that employ branch prediction. To resolve a conditional branch instruction, conventional processors first determine the result of a condition-setting instruction (e.g., a compare instruction or a recording form of an arithmetic instruction) and utilize the result of the instruction to produce the condition register bits in subsequent cycle(s). Bits from the branch instruction that specify the condition upon which the branch depends are thereafter evaluated with respect to the condition register bits to determine if the branch should be taken or not taken. Thus, in conventional processors, conditional branch instructions are resolved by performing a three step serial process.
Recently, various techniques have been developed in order to determine the value of the one or more condition register bits in parallel with the execution of certain types of instructions. Examples of such approaches are disclosed by Burns et al. in the applications incorporated by reference above. Although the early determination of the value of one or more condition register bits provides some performance advantages over the conventional approach, there remains a need in the art for a method and apparatus to accelerate the resolution of a conditional branch instruction as taken or not taken.
In order to accelerate the resolution of conditional branch instructions, the present invention provides a processor including an instruction sequencer that fetches a plurality of instructions and a detector that detects, among the plurality of fetched instructions, a condition-setting instruction and a conditional branch instruction that depends upon the condition-setting instruction. The processor further includes a decoder that decodes the conditional branch instruction to produce a decoded condition type and an execution unit. In response to the detection of the condition-setting instruction and the conditional branch instruction, the execution unit resolves the conditional branch instruction by evaluating the condition-setting instruction and the decoded condition type in a single operation. Because the condition code bits are not computed or stored as an intermediate result as in prior art processors, branch resolution is accelerated.
The above as well as additional objects, features, and advantages of the present invention will become apparent in the following detailed written description.