1. Field of the Invention
The present invention relates to compilers. More specifically, the present invention relates to the use of data dependence graphs in optimizing regions of basic blocks in a compiler.
2. The Background
A compiler is a program that reads a program written in one language, the source language, and translates it into an equivalent program in another language, the target language. A compiler may be broken up into several phases. FIG. 1 is a block diagram illustrating the phases of a typical compiler. A programming language 2 is fed into the compiler. The program language may be any programming language, such as C, C++, F77, F90, or Java. A front end 4 receives the programming language, and performs analysis (such as lexical analysis, syntax analysis, and semantic analysis) and intermediate code generation. The output of the front end 4 if an Intermediate Representation (IR) 6 of the program.
Ideally, the front end 4 would produce target code that is as efficient as code that could be written by hand. However, in the real world, this is usually not the case. Therefore, if a user wishes to have improved performance, the code must be optimized. Generally, this is performed by two components. An Intermediate Optimizer (iropt) 8 performs high level optimization, which would normally include the simpler, generic optimizations such as eliminating repetitive lines of code, resulting in IR 10. A Code Generator (CG) 12 may then be used to perform the more complex, target code-specific optimizations, such as utilizing registers and performing transformations. One of ordinary skill in the art will recognize, however, that the optimization could be combined into a single phase or could be performed by more than two phases.
In order to optimize the code, most compilers begin by breaking up the IR into basic blocks. Technically, a basic block is defined as a sequence of consecutive statements in which flow of control enters at the beginning and leaves at the end without halt or possibility of branching except at the end. The basic block may also be comprised of operations and/or machine instructions. Throughout this application, however, the term xe2x80x9cstatementsxe2x80x9d will be used with the knowledge that they could easily be operations and/or machine instructions as well. FIG. 2 is a graph illustrating an example of an IR represented as basic blocks. The control flow proceeds to block 50, where the statements in block 50 are executed from beginning to end without branching. Then the control flow proceeds to block 52, where the statements in block 52 are executed from beginning to end without branching. After block 52, the control flow may branch either to block 54 or block 56. Block 54 may branch to block 58 or may loop back up to itself. The rest of the control flow proceeds in a similar fashion.
A standard method of optimization within each basic block is to use knowledge regarding the substance of each statement within the basic block to construct a dependence directed acyclical graph (dependence DAG) for each basic block. A dependence DAG is constructed by assigning a node to each statement in the basic block and connecting the nodes with edges based upon which statements must be performed before other statements in the block. FIG. 3 is a graph illustrating an example of a dependence DAG. Assuming a basic block containing the following statements in order (the numbers in parenthesis indicate the reference number of the node assigned to each statement):
a :=b+c (100)
d :=a+e (104)
g :=a+b (106)
f :=axe2x88x92b (108)
b :=d+e (110)
Node 100 is then assigned the statement computing a value a from b and c, the values for b and c initially coming from outside the basic block 102. Nodes 104, 106, and 108 all require as input the value a, but do not require as input any value computed by each other, and therefore the order in which any of these nodes is executed is irrelevant. Therefore, a dependency edge runs from Node 100 to each of nodes 104, 106, and 108. Node 110 requires as input the value b, which was computed in node 104, and therefore must be executed after node 104 and must depend on that node.
There are actually three types of dependence edges. The first is a flow edge, which is an edge from a definition to a use. Another is an anti-dependence edge, which is an edge from a use to a definition. The last is an output edge, which is an edge from a definition to a redefinition of the same variable. Control edges can also be considered a type of edge. Dependence edges are labeled with the register or variable carrying the dependence. Dangling edges are attached to a node on only one end. This disclosure concentrates on dangling flow dependence edges, but one of ordinary skill in the art can easily extend this to include dangling anti-dependence, output dependence, or control dependence edges as may be needed for other applications of these ideas such as scheduling a cross block.
Dependence DAGs are often used to perform scheduling tasks. In modem computers, it is often advantageous to schedule upcoming statements, which allows better performance by utilizing multiple functional units and avoiding pipeline stalls. Thus, a code generator will generally create a dependence DAG for a basic block, and then schedule statements within that block based upon the resulting dependence DAG.
While dependence DAGs are effective for optimizations within each basic block, they have not been used to optimize between the basic blocks. There may be variables which depend on calculations made in other blocks. Additionally, as their name suggests, directed acyclical graphs do not contain cycles, or loops. Thus when a loop exists in the control flow graph as in the example in FIG. 3, the dependence DAG for that basic block will not take that into consideration. Knowing this information might be helpful in reordering the flow of the basic blocks.
What is needed is an efficient method that allows for global optimization across a region of several basic blocks, or even a single basic block where a loop exists.
Region based optimization may be accomplished by creating dependence graphs for each block and then incrementally computing a single dependence graph for the region. First dependence DAGs are created for each block in the region. This includes defining incoming and outgoing dangling edges for each block. Each dependence DAG is then linked as a control flow graph. Examining of each incoming dangling edge within each block of the region then takes place, with the process traversing each path along the control flow graph in reverse, attempting to match each incoming dangling edge with a corresponding outgoing dangling edge, stopping only if a match is found, the same block is examined twice, or the top of the region is found. A similar process takes place for each outgoing dangling edge, traversing each path along the control flow path forward, attempting to match each outgoing dangling edge with a corresponding incoming or outgoing dangling edge, stopping only if an outgoing match is found or the bottom of the region is found. The path is terminated if the same block is reached twice. The region may then be reduced to a single block with incoming dangling edges being any unmatched incoming dangling edges at the top of the region and outgoing dangling edges being any unmatched outgoing dangling edges at the bottom of the region. Optimization may occur during or after this reduction step to improve performance in the program. Nested loops may be handled by building the dependence graph for the inner most loop first and treating it like a dependence DAG for a block when processing the outer loops.