Optimizing compilers are important tools for programmers to improve the effectiveness and efficiency of the target CPU. The goal of an optimizing compiler is to generate the smallest and fastest set of object code possible that exactly duplicates the function of the program as it was written. In order to generate compact and efficient object code for computer programs written in high level languages, compilers used for such languages must utilize sophisticated global optimizers which generally use various specified procedures for reducing the run time or the memory requirements of a program. For example, a compiler may perform any or all of: common sub-expression elimination, code motion, strength reduction (replacing a slow operation by an equivalent fast operation), store motion and removing useless code sequences. Descriptions of some of these optimizations can be found in:
J. T. Schwartz, On Programming--An Interim Report on the SETL Language. Installment II: The SETL Language and Examples of its Use, Courant Institute of Math Sciences, NYU, 1973, pp. 293-310. PA1 E. Morel and C. Renvoise, Global Optimization by Suppression of Partial Redundancies, CACM, Vol. 22, No. 2, pp. 96-103, 1979. PA1 A. Aho and J. Ullman, Principles of Compiler Design, Addison-Wesley, 1977. PA1 (1) for a specified procedure, developing a control flowgraph representing all possible execution paths for said program; PA1 (2) identifying subgraphs in said program; PA1 (3) performing the steps of:
Each of these optimizing specified procedures transforms an intermediate language (IL) program into a semantically equivalent but more efficient IL program. Intermediate level language, as its name implies, is between a high level source program and machine code in complexity and sophistication. An intermediate level language can be especially useful in preparing compilers that are to be capable of translating any of several high level languages into machine code targetted to any of several machines; it reduces markedly the number of products that must be developed to cover a wide range of both machine types and programming languages, because all may translate through a common intermediate level language. It is at the intermediate language level that most optimizations are commonly performed.
The most important optimizations in an optimizing compiler are carried out globally, that is, on a program-wide level, rather than on a localized or basic block level. In performing each of these optimizations, a series of data flow equations must be solved. In doing so, the compiler gathers information about the expressions in the program being compiled; such information is dependent upon the flow of control in the program. For its own unique code transformation, each optimization must have a method of tracking when and how any given expression is available throughout the program as compiled.
This information is derived from the control flowgraph which is a directed graph depicting the possible execution paths of the program. In a low order flowgraph, the nodes represent basic blocks of a program and these are connected by directed edges representing paths along which control in the program flows. In a high order flowgraph the nodes are comprised of basic blocks and/or strongly connected regions.
In the present specification, the term "basic block" means any set of instructions in a computer program, whether object or source code, which is a straight-line sequence of code into which branches reach only its first instruction, and from which control leaves the basic block only after the last instruction.
The term "strongly connected region" means a set of nodes among which there is a path that can be repeatedly followed by the program control without passing through a node outside the region. Strongly connected regions are well known in the art of compiler design. A "single entry strongly connected region" is a strongly connected region that has only one node that is reached from outside the single entry strongly connected region. Hereafter in this disclosure, the term "region" means a single entry strongly connected region.
The term "subgraph" means any combination of nodes within the flowgraph. All strongly connected regions are also subgraphs, but not all subgraphs are strongly connected regions.
The term "entities" refers to the components of an intermediate representation which are used to describe a program as it is being compiled. These include variable entries, dictionary entries, results, expressions, instructions, and basic blocks of a program.
The larger and more complex a program is, the larger and more convoluted is its flowgraph, the greater the number of calculations involved and the greater the number of expressions for which dataflow equations must be solved. Memory requirements and processing time for the compilation tend to increase quadratically as a function of source program size for global optimization. When a situation arises where the compiler cannot optimize an entire program because of a space restriction, in the past the optimization has had to be abandoned. Attempts have been made to improve the quality of optimizations in the past. A small number of patents has been granted on inventions in this area.
U.S. Pat. No. 4,506,325 discloses a method of decreasing the storage requirements of a compiler by encoding operators and operands using an information theoretic encoding technique, applied to segments of a program. The disclosure does not deal with how a program is segmented.
U.S. Pat. No. 4,571,678 discloses a method of utilizing the limited number of registers in a target computer by improving register allocation procedures. It does not disclose any way of handling large programs that exceed the general memory availability of the target computer.
There remains a need for a program compilation technique that does not simply give up the struggle if an optimization cannot be performed within the constraints of the hardware or computer on which the program is being compiled.
It has now been discovered that the scope on which an optimization is applied can be limited, and yet many of the benefits of optimization can be realized. The program unit can be partitioned on the basis of its control flow structure into sections sufficiently small to be manipulated by the compiler.