1. Technical Field
The present invention is directed to an improvement in computing systems and in particular to computer systems which provide for optimized loop code generation in the compilation of computer programs.
2. Prior Art
Optimizing compilers permit efficient object code to be emitted given a particular piece of source code to be compiled. Source code which includes loops is typically the subject of optimization in compilers. For a given segment of source code containing loops and for a given target machine micro architecture, cache geometry and parallel processing capability, the loop allocation of an optimizing compiler will be used to attempt to determine a collection of object code loop nests which will give efficient execution at an acceptable compilation-time cost.
Loop allocation optimization found in known compilers typically relies upon a set of ordered loop allocation transformations, as well as optimizations for data locality and parallelism. For example, loop source code may be optimized by emitting source code which minimizes off-chip access when the loop object code is executed. Another optimization for loop source code is to emit object code which may be executed in parallel by a multi-processor machine.
Typically, prior art optimizing compilers which carry out loop allocation include loop distribution early in the set of transformations, followed by parallelism and data locality transformations and finish with loop fusion and array contraction as a cleanup phase.
In prior art optimizing compilers, nested loops are optimized on a loop-by-loop basis. A prior art approach to optimizing sibling loops is to merge such nests. This approach is described by Sarkar, V. and Gao, G. R., xe2x80x9cOptimization of Array Accesses by Collective Loop Transformations,xe2x80x9d 5th International Conference on Supercomputing, Cologne, Germany, June 1991, pp. 194-205.
This prior art approach involves a profitability and correctness test for the merger of the sibling loops. The optimization determines first if fusion of the sibling loops is desirable (a profitability analysis). Another prior art approach to loop optimization is to first distribute the loop code, to then optimize the distributed code and then to fuse the code after optimization.
Each of the above approaches to optimization involves optimizations of the loop code independent of, or following, loop distribution steps. Where nested loops are optimized on a loop-by-loop basis, optimizing which may be possible due to relationships between code in different nested loops may be missed. Similarly, where the loop code is distributed, optimized and then fused, the optimization is carried out on distributed portions of the code and interrelationships between those sections of code may not be considered in the optimization.
It is therefore desirable to have a computer system which carries out the loop allocation in an optimized compiler without accomplishing the loop distribution step at an early point in the sequence of loop transformations.
According to one aspect of the present invention, there is provided an improved system for the optimization of loop code compilation.
According to another aspect of the present invention, there is provided a computer program product for compilation of a source code segment. The computer program product has instruction means to generate a program dependence graph for the source code segment. The program dependence graph includes a control dependence graph and a data dependence graph. Each of the control dependence graph and the data dependence graph have nodes, each node in the data dependence graph is associated with one or more statements in the source code segment. There is also instruction means to generate an interference graph from the data dependence graph, with instruction means for deriving nodes for the interference graph from the nodes in the data dependence graph. The nodes in the interference graph are thereby each associated with one or more statements in the source code segment. There is instruction means for generating a node weight for each node in the interference graph, each node having a node weight reflecting the resource usage for the one or more statements associated with the node.
There is also instruction means for generating edges for the interference graph, each edge connecting a pair of nodes in the interference graph, with instruction means for generating an associated edge weight for each edge reflecting the desirability of maintaining the one or more statements associated with each of the pair of nodes connected by the edge within the same loop. There is also provided instruction means for partitioning the interference graph into subgraphs based on the edge weights and the node weights of the interference graph, and instruction means for emitting code conforming to the partitioned interference graph.
According to another aspect of the present invention, there is provided the above computer program product in which the instruction means for partitioning the interference graph comprises instruction means for first conducting a profitability test to select a pair of nodes in the interference graph, instruction means for then conducting a correctness test on the selected pair, and instruction means for merging the selected pair of nodes into a coalesced node where the correctness test is satisfied for the selected pair.
According to another aspect of the present invention, there is provided the above computer program product in which the instruction means for conducting a profitability test comprises instruction means for selecting the pair of nodes in the interference graph having the highest associated edge weight, the selected pair of nodes having a sum of node weights lower than a pre-defined resource limit for a target machine for the compiler.
According to another aspect of the present invention, there is provided the above computer program product in which the instruction means for conducting a correctness test comprises instruction means for comparing the selected pair of nodes in the interference graph with nodes in the interference graph corresponding to nodes in the data dependence graph defined to be reachable by the data dependence graph from those nodes in the data dependence graph reachable from the nodes in the data dependence graph corresponding to the selected pair of nodes in the interference graph.
According to another aspect of the present invention, there is provided the above computer program product in which the instruction means for comparing nodes comprises instruction means for defining a test set by generating a merged reachability set by taking the union of the nodes reachable from the selected pair of nodes, and removing the selected pair of nodes, and taking the union of the nodes reachable from the merged reachability set, and comparing the intersection of the test set with the union of the pair of selected nodes with the null set.
According to another aspect of the present invention, there is provided the above computer program product in which each node in the interference graph has an associated reachability vector representing which nodes in the interference graph are reachable from the node and in which set operations to determine reachability of nodes are carried out using the reachability vectors.
According to another aspect of the present invention, there is provided the above computer program product in which the instruction means for generating a program dependence graph comprises instruction means for ordering the generation of the program dependence graph from an innermost level of nested loops in the source code segment to an outermost level of nested loops in the source code segment.
According to another aspect of the present invention, there is provided the above computer program product in which the instruction means for generating a program dependence graph comprises instruction means for generating the control dependence graph for a level of nested loops in the source code segment and for then generating corresponding nodes for the data dependence graph for the said level of nested loops, the corresponding nodes in the data dependence graph being defined by constraints determined from the control dependence graph.
According to another aspect of the present invention, there is provided the above computer program product in which the instruction means for generating a program dependence graph comprises instruction means for determining pi blocks in the segment of source code and in which pi blocks are maintained in the generation of the program dependence graph from an inner level of nested loops to a parent level of nested loops in the source code segment.
According to another aspect of the present invention, there is provided the above computer program product in which the means for emitting code comprises instruction means for generating optimized object code including one or more techniques selected from the set of techniques comprising scalar expansion, usage of temporary storage, array contraction, and strip mining.
According to another aspect of the present invention, there is provided the above computer program product in which the -instruction means for emitting code comprises means for generating optimized object code by optimizing for parallelism and for data locality.
According to another aspect of the present invention, there is provided a method for compiling a source code segment, the method comprising the following steps:
1. generating a program dependence graph for the source code segment, the program dependence graph comprising a control dependence graph and a data dependence graph, each of the control dependence graph and the data dependence graph comprising nodes, each node in the data dependence graph being associated with one or more statements in the source code segment,
2. causing a computer to generate an interference graph from the data dependence graph, by deriving nodes for the interference graph from the nodes in the data dependence graph, the nodes in the interference graph thereby each being associated with one or more statements in the source code segment and
generating a node weight for each node in the interference graph, each node having a node weight reflecting the resource usage for the one or more statements associated with the node, generating edges for the interference graph, each edge connecting a pair of nodes in the interference graph, generating an associated edge weight for each edge reflecting the desirability of maintaining the one or more statements associated with each of the pair of nodes connected by the edge within the same loop,
3. partitioning the interference graph into subgraphs based on the edge weights and the node weights of the interference graph, and
4. emitting code conforming to the partitioned interference graph.
According to another aspect of the present invention, there is provided the above method in which the step of partitioning the interference graph comprises the steps of
1. conducting a profitability test to select a pair of nodes in the interference graph,
2. conducting a correctness test on the selected pair, and
3. merging the selected pair of nodes into a coalesced node where the correctness test is satisfied for the selected pair.
According to another aspect of the present invention, there is provided the above method in which the step of conducting the profitability test comprises selecting the pair of nodes in the interference graph having the highest associated edge weight, the selected pair of nodes having a sum of node weights lower than a pre-defined resource limit for a target machine for the compiler.
According to another aspect of the present invention, there is provided the above method in which conducting a correctness test comprises the steps of comparing the selected pair of nodes in the interference graph with nodes in the interference graph corresponding to nodes in the data dependence graph defined to be reachable by the data dependence graph from those nodes in the data dependence graph reachable from the nodes in the data dependence graph corresponding to the selected pair of nodes in the interference graph.
According to another aspect of the present invention, there is provided a computer program product tangibly embodying a program of instructions executable by a computer to perform the above method steps.
According to another aspect of the present invention, there is provided a system for compilation of a source code segment. The system has means to generate a program dependence graph for the source code segment. The program dependence graph includes a control dependence graph and a data dependence graph. Each of the control dependence graph and the data dependence graph have nodes, each node in the data dependence graph is associated with one or more statements in the source code segment. There is also means to generate an interference graph from the data dependence graph, with means for deriving nodes for the interference graph from the nodes in the data dependence graph. The nodes in the interference graph are thereby each associated with one or more statements in the source code segment. There is means for generating a node weight for each node in the interference graph, each node having a node weight reflecting the resource usage for the one or more statements associated with the node. There is also means for generating edges for the interference graph, each edge connecting a pair of nodes in the interference graph, with means for generating an associated edge weight for each edge reflecting the desirability of maintaining the one or more statements associated with each of the pair of nodes connected by the edge within the same loop. There is also provided means for partitioning the interference graph into subgraphs based on the edge weights and the node weights of the interference graph, and means for emitting code conforming to the partitioned interference graph.
According to another aspect of the present invention, there is provided the above system in which the means for partitioning the interference graph comprises means for first conducting a profitability test to select a pair of nodes in the interference graph, means for then conducting a correctness test on the selected pair, and means for merging the selected pair of nodes into a coalesced node where the correctness test is satisfied for the selected pair.
According to another aspect of the present invention, there is provided the above system in which the means for conducting a profitability test comprises means for selecting the pair of nodes in the interference graph having the highest associated edge weight, the selected pair of nodes having a sum of node weights lower than a pre-defined resource limit for a target machine for the compiler.
According to another aspect of the present invention, there is provided the above system in which the means for conducting a correctness test comprises means for comparing the selected pair of nodes in the interference graph with nodes in the interference graph corresponding to nodes in the data dependence graph defined to be reachable by the data dependence graph from those nodes in the data dependence graph reachable from the nodes in the data dependence graph corresponding to the selected pair of nodes in the interference graph.
According to another aspect of the present invention, there is provided the above system in which the means for comparing nodes comprises means for defining a test set by generating a merged reachability set by taking the union of the nodes reachable from the selected pair of nodes, and removing the selected pair of nodes, and taking the union of the nodes reachable from the merged reachability set, and comparing the intersection of the test set with the union of the pair of selected nodes with the null set.
According to another aspect of the present invention, there is provided the above system in which each node in the interference graph has an associated reachability vector representing which nodes in the interference graph are reachable from the node and in which set operations to determine reachability of nodes are carried out using the reachability vectors.
Advantages of the present invention include improvements in optimization across loops and nests of loops. In addition, the manipulation of graph representations of the code permits application of known heuristic-based graph partitioning techniques to the loop allocation compilation.