1. Field of the Invention
The present invention relates to improvements in compiler technology. In particular, the present invention relates to improvements in a source code compiler for a pipelined data processing system that predicts branch instruction results and uses that prediction to increase system performance. Still more particularly, the present invention relates to a system for using execution profile data from a test compilation to provide feedback to the compiler to optimize the final executable code based on that profile data.
2. Background and Related Art
Compilation is the process of transforming program source code written in a human intelligible language into a form executable by a data processing system. Compilation transforms each language statement into one or more machine language statements. Typically, the compiler will perform the transformation in phases or passes. The first pass of the compiler typically transforms the source code into an intermediate form. The second pass typically performs code optimization and then generates the final executable machine language for the target platform.
Code optimization is a well developed area of compiler technology. Code optimization includes attempts to improve the performance of program execution by rearranging the code so that it executes faster but with the same functional operation. An example, is optimization by bringing the instructions from a subroutine into the main body of code to avoid the overhead of subroutine call and return.
Pipelined data processing systems have been developed to increase system throughput. A pipelined system breaks the interpretation and execution of an instruction into sequences that can be executed in parallel. The sequence of Instruction Fetch, Instruction Decode and Instruction Execute is performed in parallel so that an instruction is executed, ideally, each machine cycle. The Instruction Decode step places the decoded instruction on an instruction queue. The Instruction Execution unit then takes the next instruction from the instruction queue for execution.
Pipelined processing offers performance improvements only if the instruction stream is not interrupted, i.e. the instruction pipeline remains full and no pipeline stalls or "bubbles" are introduced. The instruction stream can be interrupted when the program calls for the execution of an instruction out of sequence. This occurs, for example, when a conditional branch instruction is encountered. Program code may have the form: EQU If x&gt;1 then y=1; EQU Else y=0
If x is less than or equal to 1 the next statement "y=1" is skipped and execution branches to the statement "y=0." If the machine instructions for assigning 1 to y have been fetched and decoded, these instructions must be purged from the instruction queue and the execution unit must wait until the instructions for setting y=0 reach the top of the queue. This waiting results in several lost machine cycles and a corresponding reduction in system throughput. Conditional branches can induce pipeline stalls due to the latency in determining the outcome of the branch condition. The processor typically employs some level of branch prediction in an attempt to keep the pipeline full by selecting what is hopefully the correct path.
The impact of conditional branches is significant because most program code contains a significant number of branches. Very little useful code is executed sequentially from top to bottom. The performance of a system can be improved by improved branch prediction.
Branch prediction attempts to predict which set of instructions will be executed after a branch: the "branch taken" set; or the "branch not taken" set. If the prediction is correct the system loses no time due to instruction stall waiting for the correct instruction. If the prediction is incorrect the queue must be flushed and the new instructions loaded with a resulting performance degradation.
Both hardware based and software based branch prediction solutions have been proposed. U.S. Pat. No. 5,367,703 entitled "Method and System for Enhanced Branch History Prediction Accuracy in a Superscalar Processor System" to Levitan maintains a branch history table for each fetch position within a multi-instruction access. The branch history table is used to predict whether a branch will be taken or not taken. The branch history table consists preferably of a two bit binary counter that is incremented or decremented depending on whether or not the branch is taken.
An article entitled "Adaptive Branch Prediction" in the IBM Technical Disclosure Bulletin, Vol. 36, No. 8, August 1993 by D. S. Levitan and D. E. Waldecker suggests a system for predicting branches based on run-time branch statistics or on historical branch statistics or on both during a single program execution. An indicator is used to indicate when the processor should switch between historical and run time prediction.
U.S. Pat. No. 4,430,706 entitled "Branch Prediction Apparatus and Method for a Data Processing System" collects branch taken statistics in memory hashed by instruction address. This allows the system to access the history whenever that instruction is encountered.
IBM RISC System/6000 processors always choose to predict that a conditional branch is not taken. IBM PowerPC processors (PowerPC is a trademark of IBM) introduce a more sophisticated test based on three variables: 1) the branch condition type; 2) the branch displacement sign bit; and 3) a branch predict bit ("Y bit"). If the branch condition type is "branch always" or if the branch displacement sign bit is set (i.e. a negative branch displacement) then the branch is predicted taken if the Y bit is zero and predicted not taken if the Y bit is one. Otherwise, the branch is predicted not taken if the Y bit is zero and taken if the Y bit is one. The Y bit may be set or cleared as desired to aid in branch prediction.
The PowerPC processor instruction set specifies the Y bit to be in bit 10 of in the instruction field of the branch condition operand (BO). The PowerPC 601 Processor User's Manual, Rev. 1, June 1993, page 3-68 states: "The y bit provides a hint about whether a conditional branch is likely to be taken and issued by the MPC601 to improve performance." Other processors may implement a prediction bit in other ways. The precise format of the prediction bit is not within the scope of the invention. The use of a prediction bit, in whatever form, is within the scope of the invention.
Prior art compilers have attempted to perform static branch prediction analysis in an effort to use the hardware features available. For example, RISC System/6000 compilers attempt to generate conditional branches with code that always falls through since the system always predicts the fall through path. Compilers for the PowerPC processor could use static analysis to set the branch prediction (Y bit). Static branch analysis, however, is typically insufficient to accurately predict actual program behavior and can actually reduce the branch prediction accuracy below the rate that would occur if no prediction were used. Whether a branch is taken or not taken depends on the data processed by the system and upon the assumptions and style of the programmer.
Thus, a technical problem exists to develop a method for analyzing program code to accurately predict the conditional branch selection for a program during actual use and for using that information to optimize program execution by modifying the executable code to optimize branch prediction.