The present invention relates generally to a computer method and system for generating a computer program and, more specifically, to a computer method and system for optimizing a computer program.
Computer programs are generally written in a high-level programming language (e.g., Pascal and C). A compiler is then used to translate the instructions of the high-level programming language into machine instructions, which can be executed by a computer. The use of high-level programming languages results in dramatically increased program development productivity as compared to programming in either an assembly language or machine language. The increase in program development productivity results from allowing a programmer to use high-level computational constructs rather than low-level computational constructs. Since certain high-level computational constructs (e.g., Fast Fourier Transform) may translate into hundreds of low-level computational constructs, some of the increase in productivity arises directly from the ratio of high-level to low-level computational constructs. A programmer can more quickly specify one high-level computational construct rather than hundreds of low-level computational constructs. Also, a programmer can typically debug a computer program expressed with high-level computational constructs more quickly than the equivalent computer program expressed with low-level computational constructs. The productivity also increases because the intent of the programmer is generally more readily apparent when the computer program is expressed with high-level computational constructs. The identification of the programmer""s intent is especially useful to another programmer who later modifies the computer program.
Unfortunately, the code compiled from a computer program developed in a high-level programming language can be much less efficient than the code generated from an equivalent computer program expressed in a low-level programming language. The efficiency of code manually written with low-level computational constructs may result, for example, because some or many of the low-level operations compiled from different high-level operations may be shared so that the compiled low-level operations can be rewoven into different sequences of low-level operations to implement the different high-level operations. Such code reweaving and sharing is, however, very difficult to automate once the high-level operations are compiled into segments of low-level operations. Some of the analyses required to accomplish such reweaving can require deep inferences about the low-level operations that for large segments of compiled code in the context of large programs could take years to complete using an automated reasoning system. On the other hand, a human programmer, given enough time, can always develop code using low-level computational constructs to be as efficient as code compiled from a computer program developed in a high-level programming language. Moreover, the programmer may be able to optimize a computer program developed using low-level computational constructs in ways that are not currently available to a compiler or a post-compilation optimizer. For example, current compilation techniques often generate highly inefficient code to access high-level composite data structures, such as a matrix. Operations on a matrix typically require iterative code (i.e., looping code) that selects and processes each one of the elements of the matrix in each step of the iteration. If several high-level operations on a matrix are combined into single statement and each operation requires selecting and processing each element, the compiled code is likely to include multiple xe2x80x9cforxe2x80x9d loops that each include a nested xe2x80x9cforxe2x80x9d loops. For example, to double the value of each element of a matrix and then add the resulting values to the values of the elements of another matrix, the programmer may specify the following statement using high-level computational constructs:
A=B+(2*C)
where A, B, and C are matrices. When a compiler generates code for this high-level statement, the compiler may generate nested xe2x80x9cforxe2x80x9d loops to select the element at each row and each column of the C matrix, to multiply the value of the selected element by 2, and to store the doubled values in a temporary matrix. The compiler also generates nested xe2x80x9cforxe2x80x9d loops to select the element at each row and column of the temporary matrix, to add that value to the value of the corresponding element of the B matrix and to store the sum in another temporary matrix. The compiler generates a third set of nested xe2x80x9cforxe2x80x9d loops to select the element at each row and column of the temporary matrix and to store the value of that element into the A matrix. The resulting code may be:
for i=1,m
for j=1,n
temp[i,j]=2*C[i,j]
for i=1,m
for j=1,n
temp[i,j]=B[i,j]+temp[i,j]
for i=1,m
for j=1,n
A[i,j]=temp[i,j]
A programmer, however, using a low-level language would combine these three sets of nested xe2x80x9cforxe2x80x9d loops into a single set of nested xe2x80x9cforxe2x80x9d loops. The resulting programmer optimized code would be:
for i=1,m
for j=1,n
A[i,j]=B[i,j]+(2*C[i,j])
Although program optimizers have been developed that attempt to coalesce loops, the optimization techniques generally do a poor job in identifying those portions of the compiled code that lend themselves to coalescing.
If a programming language has high-level computational constructs built into the language, then various optimization techniques can be also built in. For example, since the computational construct xe2x80x9cmatrix,xe2x80x9d which is a composite of low-level computational constructs such as integers and integer operations, is built directly into the programming language APL, the loop optimizations described above are easily accomplished by APL. APL compilers can and do use the built-in knowledge of the semantics of matrices and their operations to anticipate the looping inefficiencies and compile directly to the optimized code for known patterns and combinations of high-level operators and high-level composite operands. If, however, such high-level computational constructs are not built into an existing programming language, then they need to be added as extensions to the existing programming language or defined by the programmer. In both cases, the extensions and their optimizations cannot be added to the existing language framework by simply defining the new constructs and their optimizations in terms of existing constructs. These extensions and their optimizations cannot be added primarily because conventional programming languages provide no constructs for expressing how to reweave definitions of high-level operators and data structures for optimizable combinations of those operators and data structures. Of course, the programmer can define to the compiler high-level composite data structures like images, for example, in terms of matrix-like data structures composed with other data structures (e.g., C like structs) that are native to the target compiler (e.g., C). Further, the programmer can define how each individual composite data structure and their operators are to be compiled individually (e.g., the xe2x80x9c*xe2x80x9d operator used in the context of xe2x80x9c(constant*matrix)xe2x80x9d type expression should compile into two nested loops over the matrix). However, the programmer has no way to tell the compiler how to reweave the code of several such individual definitions composed together in an expression in order to share the low-level operations for the particular composition.
It would be desirable to have a general method for a programmer to define such optimization rules for reweaving the definitions of high-level (i.e., composite) data structures and their operators when the multiple high-level operators and their high-level data structures are used together in optimizable patterns or combinations. It would further be desirable that the optimization method behave like a compiler definition generated on-demand for some specific combination of operators and operands that directly compiles the rewoven, optimized low-level code without ever producing the separate segments of uncombined low-level code for each definition of the individual high-level operators and operands.
The present invention provides an Anticipatory Optimization method and system for programmer-defined optimization rules that can be applied when generating a low-level implementation of, computer program from a high-level implementation of a computer program. The system provides a mechanism for user-defined transforms for annotating the high-level implementation of the computer program with optimization tags; for redefining those high-level operators, operands, and tags; and for combining those high-level operators, operands, and tags such that the individual low-level operations eventually compiled for each of the high-level operators and operands are shared so as to improve performance. The optimization tags indicate optimizations that may be applicable when generating the low-level implementation of the computer program. For each high-level computational construct of the high-level implementation of the computer program, the system applies those transforms that are applicable to the high-level computational construct so that annotations indicating various optimizations can be added to the high-level computation construct. The system then generates the low-level implementation in accordance with the indicated optimizations.
In one embodiment, the high-level computational constructs are user-defined, domain-specific computational constructs and the low-level computational constructs are programming-language-defined computational constructs, the computational constructs including high-level operands that are domain-specific composites of low-level computational constructs. An abstract syntax tree. (AST) representation of the program is generated and, with respect to the AST, loop merging is performed on the program followed by a kind of program reorganization called xe2x80x9ccomposite folding.
The AST includes nodes that represent computational constructs within the program and abstract optimization tags that are used for various optimization purposes, such as storing optimization state information or indicating deferred program transformations that are used to generate an optimized version of the AST. For each AST pattern of user-defined, domain-specific computational constructs, the system determines whether a user-defined, domain-specific transform has, been defined for the pattern. A transform transforms a portion of the AST relating to the pattern of user-defined, domain-specific computational constructs into one or more programming-language-defined computational constructs. When a domain-specific transform has been defined for the pattern of computational constructs, the system transforms the AST in accordance with the domain-specific transform. The transformed AST is in a form that reflects an optimization of the programming-language-defined computational constructs based on the domain-specific computational constructs.
In the phase of optimization following loop merging, a composite folding process is applied to the AST according to the optimization tags to generate optimized code for the program. The composite folding process includes identifying an optimization event, identifying each abstract optimization tag applied to the programming-language-defined or remaining domain specific computational constructs and having a transformation condition identifying the optimization event as a condition for attempting an anticipated optimization in the translation, and attempting execution of each of the anticipated optimization transformations associated with the optimization event.