Compilers convert computer programs from a human readable form (high level language) into a machine readable form (machine language) which can be directly used by a computer of a target type. Compilers enable a human operator (programmer) to more easily produce computer program code in the restrictive syntax required by the target computer. The programmer may concentrate on algorithm development at a greater level of abstraction. The compiler performs the task of converting the resulting program in the high level language into the exacting syntax required by the machine language of the target computer type.
Compilers are typically implemented as programs that control a general purpose computer. A compiler typically operates by recalling the high level language program from non-volatile memory, parsing the high level language program, producing a corresponding machine language program and storing the resulting machine language program in non-volatile memory. The resulting machine language program is transferred to an example of the target computer for loading and use.
The step of producing the machine language program typically involves some level of optimization. Even the earliest examples of compilers included optimization techniques. The earliest compilers competed with human programmers that were highly skilled in producing machine language for the target computer. Early compiler users were generally less skilled programmers, or at least less specialized in programming for the target computer. Thus compilers employ optimization to produce better machine language programs from the source high level language program. The resultant better machine language program may operate faster, employ computer resources more efficiently or the like.
Compiler optimization focuses on a variety of improvements such as tracking and using values at compile time, finding better instruction sequences, and moving computation to less expensive places in the code. All of these code transformations require data-flow analysis, defined as compile-time reasoning about the runtime flow of values. There are many kinds of data-flow analysis, each aimed at particular optimizations.
Compiler optimizations typically rely upon the nature of the compiling computer. Programmed computers are much more adept at repetitive and voluminous tasks than a human operator. Compiler optimization employs this adeptness in searching for improved machine language implementations of the source high level language.
Typically a compiler will generate a corresponding machine language program for any input high level program having legal syntax. A high level program is presented in proper syntax to be converted into machine language does not usually use machine resources wisely. Data-flow analysis describes how the high-level-language uses data. The compiler uses this analysis to drive transformations that perform the same function more efficiently on the target machine. For example, the high-level language may declare some piece of data variable, but the actual data use always has a constant value. The machine-language program could better use resources by treating this quantity as a constant rather than a variable.
Data-flow analysis determines whether the source high level language program efficiently uses data. The compiler uses this analysis to drive transformations that perform the same function more effectively on the target machine. The compiler must allocate memory and registers to variables. This allocation must take into account data use. The same resources of memory and registers could be allocated to variables not used together. As another example, the high level language may declare a data variable but the actual data use is of a constant. The machine language program could better use resources by treating this quantity as a constant rather than a variable.
A compiler implementing data-flow analysis typically relies on the iterative algorithm for data-flow analysis. The literature describes the kinds of equations for which this analysis will converge and find a solution that does not change as the known information is considered. Certain compiler optimizations will use equations that do not fit this model. The typically employed iterative algorithm is still applicable, but the answers (usually in the form of sets) may not converge. When the analysis fails to converge, the compiler is conservative and throws away all of the computation including all data on all variables tracked. The upper bound on the number of required iterations before the sets converge or may be expected to not converge is defined by a theoretic characteristic of the control-flow graph of the program called loop connectedness known as d(G). According to the prior art calculating this loop connectedness was believed to require a time corresponding to the exponent of the number of nodes of the control flow graph. Computations of this order are impractical and thus were not attempted in the prior art.
The prior art employed a compromise to deal with this lack of knowledge of the number of loop iterations required for convergence, if convergent is possible. The prior art selects a maximum number of data-flow analysis iterations arbitrarily. Data-flow analysis proceeded until either the arbitrarily chosen maximum number of iterations is reached or convergence is detected. Convergence is determined when no tracked values changed during the prior iteration. If the maximum number of iterations is reached without detecting convergence of all data values, the prior art compiler typically assumed all data of that data-flow analysis is invalid. Such data was discarded. Thus resources employed when reaching the maximum number of iterations without detecting convergence were wasted.
The arbitrarily chosen number of iterations of the prior art has three weaknesses. If the actual number of iterations necessary to show non-convergence is much smaller than the arbitrary limit, the compiler wastes resources on extra, useless computation. If the number of iterations necessary to show non-convergence is only slightly higher than the arbitrarily chosen limit, the compiler discards information that could have been useful if only a little more work had been done. If the compiler could determine the exact number of iterations necessary to prove convergence, the compiler might be able to keep information that had converged, while discarding only the information for variables that had not yet converged. The prior art doesn't know when a computation would have converged, so all information is discarded.