1. Field of the Invention
The present invention relates to an optimization apparatus equipped in a compiler apparatus, and to a computer-readable storage medium storing an optimization program.
2. Description of the Background Art
In recent years, electronics engineers have found it very difficult to develop embedded microcomputer systems that realize high-level and complex control. In general, an embedded microcomputer system refers to a computer system in which a mask ROM that stores all control programs from the firmware to application programs is integrated with a microprocessor. Such embedded microcomputer systems have increasingly been used in household electrical appliances, machine tools, information apparatuses, and communication apparatuses.
Nowadays it is common to develop programs embedded in such microcomputer systems using high-level programming language, such as C, since in view of recent rapid increases in scale of embedded-type application software, it is no longer possible to realize the high-level processing required for these embedded programs in the old software development environment based on assembly language. Also, to develop the embedded programs that realize high-level processing using assembly language puts a considerable burden on engineers.
However, when compared with application software developed using assembly language, machine-language software developed using high-level language has a problem of high redundancy. Accordingly, manufacturers who intend to suppress the cost of their products are reluctant to use high-level programming language to develop embedded programs.
For embedded processors, programs have to be stored in ROM, so that increases in program size can greatly affect manufacturing cost. Also, when specific performance (execution speed) is required for products, more expensive microprocessors have to be used or microprocessors have to operate at higher clock speeds, due to an increase in the execution time of embedded programs.
Thus, there are the notable disadvantages in using high-level programming language to develop embedded programs. To allow greater use of high-level programming language when developing embedded software, it is necessary to establish a high-level optimization algorithm that can eliminate redundancy of the resulting software.
While there are many definitions of program redundancy, in the present specification program redundancy indicates all factors, that are present in a program written in high-level language or intermediate language, which cause increases in code size and execution time of a machine language program after compiling.
Before explaining conventional optimization apparatuses, the construction of conventional compiler apparatuses is explained below, with reference to the following publications.
(1) A. V. Aho, R. Sethi, J. D. Ullman (1986): Compilers: Principles, Techniques, and Tools, Addison-Wesley Publishing Company Inc. (translated in Japanese by Kenichi Harada (1990): Compilers I, II, Science company Inc.) PA1 (2) Hans Zima (1991): Supercompilers for Parallel and Vector Computers, Addison-Wesley Publishing Company Inc. (translated in Japanese by Yoichi Muraoka (1995): Supercompilers, Oum Company Inc.) PA1 (3) Masataka Sasa (1989): Programming Language Processing System, Iwanami PA1 GEN[B]={s.vertline.s(x=).epsilon.B, where s'(x=) does not exist between s and exit point B} PA1 KILL[B]={s.vertline.s(x=).delta.(basic blocks other than B), where s'(x=).epsilon.B exists} PA1 IN[B]={s.vertline.where s'(x=) does not exist between intermediate instruction s(x=) and entry point B} PA1 OUT[B]={s.vertline.where s'(x=) does not exist between intermediate instruction s(x=) and exit point B}
FIG. 1 shows the construction of a conventional compiler apparatus. In the figure, the compiler apparatus includes a syntax analysis apparatus 41, an optimization apparatus 42, and a code generation apparatus 49.
The syntax analysis apparatus 41 performs lexical analysis, syntax analysis, and semantic analysis on a source program which is stored as a file in a storage device (not illustrated) and converts the source program to an intermediate program to simplify the processing by the compiler apparatus. Here, each step (construct) in the intermediate program is called an intermediate instruction. Types of intermediate instructions include a quadruple, a triple, and an abstract syntax tree (reference (1): p.464). Such intermediate instructions are further converted to object code by the code generation apparatus 49. In this specification, the syntax analysis apparatus 41 is not explained in detail, since it is not especially related to optimization processing which is the main focus of the present invention.
The optimization apparatus 42 optimizes the intermediate program to reduce the program size and the execution time of the resulting machine language program. The optimization apparatus 42 includes an optimization control unit 43, a control flow information analysis unit 44, a data flow information analysis unit 45, an intermediate code optimization unit 46, a control flow information storage unit 47, and a data flow information storage unit 48.
The control flow information analysis unit 44 divides the intermediate program into basic blocks where the control flow is unidirectional, and obtains control flow information that shows the control flow between basic blocks (reference (1): p.528). The basic blocks are explained in detail later.
The data flow information analysis unit 45 analyzes the intermediate code using the control flow information and obtains data flow information showing reaching definitions, available expressions, and live variables (for more details, see reference (1): pp.608-722). This data flow information is obtained by finding information that is to be used as data flow information, setting data flow equations for the entry and exit points of the basic blocks, and calculating the data flow equations according to an iterative algorithm.
The intermediate code optimization unit 46 uses the control flow information and the data flow information to optimize the intermediate code. Examples of such optimization processing are deletion of basic blocks that are beyond control using the control flow information, optimization of common subexpressions using the information on available expressions (reference (1): p.592), copy propagation using the information on reaching definitions (reference (1): p.594), and deletion of unnecessary code using the information on live variables (reference (3): p.482). The optimization processing is explained in greater detail later.
The control flow information storage unit 47 stores the control flow information generated by the control flow information analysis unit 44.
The data flow information storage unit 48 stores the data flow information generated by the data flow information analysis unit 45.
The code generation apparatus 49 allocates registers or memory to variables written in the intermediate program and converts each intermediate instruction to a machine language instruction. This code generation apparatus 49 is not the main focus of the present invention and so is not explained in detail here.
The following is a more detailed explanation of the optimization apparatus 42, focusing on features related to the present invention, namely, the copy propagation using the information on reaching definitions and the optimization of common subexpressions using the information on available expressions.
First, the terms used in the explanation are introduced.
&lt;Program Point&gt;
Any point located between two adjacent intermediate instructions (reference (3):p.461).
&lt;Basic Block&gt; (reference (1): p.528)
When optimizing the program, there is a danger that the algorithm will be destroyed by rewriting instructions which include jump instructions or jump destinations. Accordingly, in the optimization processing, the execution order has to be unidirectional from the start to the end. Hence, in the intermediate program, each part (block) that includes neither a jump nor a jump destination is called a basic block, which is the minimum unit of the optimization processing. A point immediately before a first intermediate instruction in a basic block is called the entry point of the basic block, while a point immediately after a last intermediate instruction in the basic block is called the exit point of the basic block. A basic block is not divided when it includes a subroutine (function) call intermediate instruction, since optimization information can be analyzed more extensively if the program is divided into larger blocks.
In the following explanation, each basic block is a set of intermediate instructions, whereby "s.epsilon.B" expresses that intermediate instruction s belongs to basic block B. Basic blocks executed immediately before basic block B are called "preceding blocks", a set of preceding blocks of basic block B being expressed as "pred(B)" (reference (1): p.532). Also, basic blocks executed immediately after basic block B are called "succeeding blocks", a set of succeeding blocks of basic block B being expressed as "succ(B)". An example of basic blocks is shown in FIG. 2. Preceding blocks of basic block BLK2 are basic blocks BLK1 and BLK5, while succeeding blocks of basic block BLK2 are basic blocks BLK3 and BLK4. A basic block which is executed first and which does not have preceding blocks is called an "initial block".
&lt;Definitions and Uses of Variables&gt;
"to define" means to set a certain value in a variable, and "to use" or "to refer" means to use the set value (reference (3): p.419). In an intermediate instruction "s1: a=b op c" (a, b, c=variables, op=operator), intermediate instruction s1 is both a definition of variable a and a use (reference) of variables b and c. For simplicity's sake, hereinafter "s1(a=)" expresses that intermediate instruction s1 is a definition of variable a, and "s1(=b)" expresses that intermediate instruction s1 is a use (reference) of variable b.
&lt;Reaching Definition&gt;
A reaching definition is one of the fundamental elements in data flow information. If variable x is used in intermediate instruction s, definition d that sets the value of variable x is called a reaching definition that reaches, which is to say, is unbroken until, intermediate instruction s. While the word "reaching" is used in reference (3), the present specification also uses the word "unbroken" to express this concept.
If, in a route from definition d that defines variable x to intermediate instruction s, a different definition of variable x exists, definition d does not reach intermediate instruction s. If, on the other hand, no other definitions exist, definition d reaches intermediate instruction s. Such definitions d that reach intermediate instruction s are called reaching definitions.
Note here that variable x is not necessarily used in intermediate instruction s. Also, in the above explanation, "intermediate instruction s" can be replaced with "point p immediately before intermediate instruction s".
When a basic block has a plurality of preceding blocks, a plurality of definitions included in the plurality of preceding blocks may reach the same intermediate instruction in the basic block. An example of this case is shown in FIG. 5. In a route from intermediate instruction s17 that defines variable b2 in basic block B1 to intermediate instruction s33 that uses variable b2, intermediate instruction s17 is a definition that reaches intermediate instruction s33. In the same way, intermediate instruction s29 that defines variable b2 in basic block B2 is also a definition that reaches intermediate instruction s33. Thus, variable b2 in intermediate instruction s33 has the two reaching definitions located in the two preceding blocks. Also, though intermediate instruction s33 does not use variable t21, intermediate instruction s13 which defines variable t21 is a definition that is unbroken until intermediate instruction s33, when no other definitions that define variable t21 exist between intermediate instruction s13 and intermediate instruction s33.
&lt;Data Flow Equations&gt;
In order to obtain data flow information showing which definition(s) reaches each intermediate instruction, it is necessary to set data flow equations and solve the data flow equations using an iterative method (reference (3): pp.471-472). This data flow equations are explained below.
Here, a group of definitions that are generated in basic block B and that are unbroken until the exit point of basic block B is expressed as "GEN[B]". This group is expressed by Equation 2.
{Equation 2}
Next, when there are a group of definitions that define variable x in basic blocks other than basic block B, and when intermediate instruction "s'(x=)" exists in basic block B, the group of definitions of variable x is expressed as "KILL[B]". This group is expressed by Equation 3.
{Equation 3}
Next, a group of definitions that potentially reach the entry point or basic block B is expressed as "IN[B]", while a group of definitions that potentially reach the exit point of basic block B is expressed as "OUT[B]". These groups are expressed by Equation 4.
{Equation 4}
From the above equations, data flow equations are expressed by Equation 5.
{Equation 5} EQU IN[B]=.orgate.OUT [B'] (1) EQU B'.epsilon.pred(B) EQU OUT[B]=GEN[B].orgate.(IN[B]-KILL[B]) (2)
Here, ".orgate." indicates a set total, while "-" indicates a set difference. Equation 5(1) shows that definitions which are unbroken until the entry point of basic block B are a set total of definitions which are unbroken until the exit points of preceding blocks of basic block B. Equation 5(2) shows that definitions which are unbroken until the exit point of basic block B are a set total of definitions GEN[B] generated in basic block B and definitions which are unbroken until the entry point of basic block B and which define variables that are not defined in basic block B.
Equation 5 is a simultaneous equation in which IN[B] and OUT[B] are variables. By solving this equation, IN[B] that reaches basic block B is obtained. As a result, a reaching definition for each intermediate instruction in basic block B can be easily obtained in the execution order by changing the content of IN[B] as necessary (see reference (3): p.475).
&lt;Iterative Algorithm for the Data Flow Equations&gt;
An iterative algorithm is commonly used to solve the data flow equations as in the case of reaching definitions (reference (3): pp.473-474). FIG. 3 shows such an iterative algorithm.
Here, each sentence, such as "repeat", "for", and "if", and each operator, such as "=", "!=", and "==", are written in C, while "false" and "true" are respectively given values 0 and 1. Other algorithms described in this specification are written in the same way.
In the above algorithm, the calculations of IN[B] and OUT[B] are repeated until the value of OUT[B] of each basic block B no longer changes, that is, until the value converges, IN[B] and OUT[B] obtained as such are the solutions of the data flow equations. For the convergency of the iterative algorithm, it is usually necessary for the sets of data flow information to be a semi-lattice to a confluent calculation (reference (2): pp.79-88). For instance, in the case of reaching definitions, it is necessary to show that a group of reaching definitions is a semi-lattice to the confluence calculation ".orgate." for preceding blocks.
Also, it is necessary to show that function f which shows the effect of each basic block is a monotonic function. For example, in the case of reaching definitions, it is necessary to show a function "f(X)=GEN[B].orgate.(X-KILL[B])" is a monotonic function, where X is IN[B] in the equation for OUT[B].
&lt;Use-Definition Chain Information, Definition-Use Chain Information&gt; (reference (3): p.476)
Use-definition chain information is information, produced from the reaching definition information, that shows a list of definitions which are unbroken until each use of a variable. In FIG. 4, when intermediate instructions s12 and s24 that define variable b1 are reaching definitions for intermediate instruction s32 that uses variable b1, the use-definition chain information for variable b1 in intermediate instruction s32 is given as a list (s12,s24).
On the other hand, definition-use chain information shows a list of uses of a variable until which each definition is unbroken. In FIG. 4, when intermediate instruction s12 which defines variable b1 reaches intermediate instruction s32, definition-use chain information for variable b1 in intermediate instruction s12 is given as a list (s32). The same can be applied to intermediate instruction s24. Also, in FIG. 7, when intermediate instruction s5 which defines variable x4 is unbroken until intermediate instructions s16, s27, and s35 that each use variables x4, definition-use chain information for variable x4 in intermediate instruction s5 shows a list (s16,s27,s35).
In FIGS. 4-8, use-definition chain information and definition-use chain information are illustrated by dashed arrows. For example, in FIG. 4, the dashed arrow from s32 to s12 shows use-definition chain information for variable b1, while the dashed arrow from s12 to s32 shows definition-use chain information for variable b1.
&lt;Available Expressions&gt;
An available expression is also a fundamental element in data flow information. In each route from the start point of the program to intermediate instruction s, an expression "E: x op y" (op=operator) is evaluation of expression E and intermediate instruction s, expression E is available in intermediate instruction s. In the above explanation, "intermediate instruction s" can be replaced with "point p immediately before intermediate instruction s".
An example is given in FIG. 7. In the route from intermediate instruction s16 to intermediate instruction s35, when intermediate instruction s16 is the last evaluation of expression x4+y4 (that is to say, there is no intermediate instruction, other than s16, that executes expression x4+y4 between s16 and s35), when there are no definitions of variables x4 and y4, and when the same applies to s27, expression x4+y4 is an available expression in intermediate instruction s35.
The information on available expressions is obtained by defining data flow equations relating to the available expressions and by calculating the equations according to an iterative algorithm, in the same way as the information on reaching definitions (reference (3): pp476-479).
The following is an explanation of the copy propagation and the common subexpression optimization using the data flow information on reaching definitions and available expressions.
&lt;Copy Propagation&gt;
When there is an intermediate instruction "s: x=y" (hereinafter referred to as "copy"), and intermediate instruction s is the only definition that is unbroken until intermediate instruction s' which uses variable x, variable x in intermediate instruction s' is replaced with y (this replacement is called copy propagation). Also, if there is no intermediate instruction, aside from intermediate instruction s', that uses x, intermediate instruction s is deleted. In FIG. 6, intermediate instruction s6 is the only definition of variable t32 that reaches intermediate instructions s26 and s30, while intermediate instructions s26 and s30 are the only uses of variable t32 defined in intermediate instruction s6. Accordingly, variable t32 in each intermediate instruction s26 and s30 is replaced with x3, and intermediate instruction s6 is deleted (reference (3): p.445).
&lt;Common Subexpression Optimization&gt;
The common subexpression optimization is performed in order to avoid the evaluation (execution) of an expression that has already been evaluated.
In FIG. 7, for instance, expression x4+y4 is an available expression in intermediate instruction s35. Here, since expression x4+y4 has already been evaluated in intermediate instructions s16 and s27, it is unnecessary to evaluate expression x4+y4 in intermediate instruction s35. Accordingly, the common subexpression optimization is performed by introducing new variable w, rewriting intermediate instructions s16, s27, and s35, and writing new intermediate instructions s50 and s51 as copies, as shown in FIG. 8. This type of optimization is used when, despite the inclusion of new intermediate instructions s50 and s51, an overall reduction of cost (program size and execution time) is possible by canceling the evaluation of expression x4+y4 in intermediate instruction s35 (reference (3): p.446).
However, to perform optimization based on global dependency between a plurality of basic blocks using the data flow information, much analysis time is required to confirm that the copy propagation or the deletion of instructions is possible. Since a plurality of execution orders and a plurality of dependence relations exist when a plurality of basic blocks are activated by conditional branch instructions, the analysis of the global dependency has to be performed thoroughly. Also, thorough analysis has to be performed on the assumption of feedback-type dependency, in which the control flow of a branch origin basic block depends on the control flow of a branch destination basic block. Even when considerable time is spent in analyzing the global dependency which crosses over between basic blocks, it is still uncertain whether it is safe to perform the optimization on the basic blocks. For these reasons, the above types of optimization are problematic.
The following is a description on how analysis to confirm the safety of optimization is executed, with reference to FIGS. 4-7.
In FIG. 4, since variable a1 is equal to variable b1 in intermediate instruction s32, it appears that intermediate instruction s32 can be deleted. However, there are two definitions of variable b1, namely, intermediate instructions s12 and s24, that are unbroken until intermediate instruction s32. Accordingly, copy propagation cannot be performed on intermediate instructions s12 and s24, so that a different type of optimization needs to be executed. To execute the different type of optimization, it is necessary to check which intermediate instruction is a reaching definition for variable b1 in intermediate instruction s32, and how variable b1 is defined in that intermediate instruction. In the example shown in FIG. 4, it is necessary to check that intermediate instructions s12 and s24 are copies that each have variable a1 on the right side. Here, use-definition dependency for variable b1 in intermediate instruction s32 is relatively simple, so that the need for optimization can be determined by referring to use-definition chain information for variable b1.
In FIG. 5, on the other hand, there are more complex dependency relations among the basic blocks and accordingly use-definition chain information needs to be analyzed in more detail. In the figure, it appears that intermediate instruction s33 can be deleted according to the use-definition chain information and definition-use chain information.
However, in order to confirm the safety in deleting intermediate instruction s33, first it is necessary to check intermediate instruction s17 shown in the use-definition chain information for variable b2 in intermediate instruction s33 and detect the right side member (variable t21) in intermediate instruction s17. Then, by referring to the use-definition chain information for variable t21 in intermediate instruction s17, intermediate instruction s13 that defines variable t21 is obtained and its right side member (variable a2) is detected. The same processing has to be performed on intermediate instruction s29 shown in the use-definition chain information for variable b2 in intermediate instruction s33 and intermediate instruction s25 shown in the use-definition chain information for variable t22 in intermediate instruction s29. Thus, it is necessary to thoroughly check that variable b2 is equal to variable a2 in every case by tracing use-definition chain information step by step. As a result, it is confirmed that intermediate instruction s33 can be deleted.
In FIG. 6, dependency relations are further complex, so that it is necessary to analyze not only use-definition chain information but definition-use chain information. Though it appears that intermediate instruction s34 can be deleted, both use-definition chain information and definition-use chain information have to be analyzed in order to confirm the safety in deleting intermediate instruction s34. First, as in FIG. 5, intermediate instructions s18 and s15 are traced respectively from the use-definition chain information for variable b3 in intermediate instruction s34 and the use-definition chain information for variable t31 in intermediate instruction s18, in order to confirm that variable b3 is equal to variable a3. Next, intermediate instruction s30 is traced from the use-definition chain information for variable b3 in intermediate instruction s34, and intermediate instruction s6 is traced from the use-definition chain information for variable t32 in intermediate instruction s30. Then, intermediate instruction s26 is traced from the definition-use chain information for variable t32 in intermediate instruction s6. Finally, by checking the left side member (variable a3) in intermediate instruction s26, it is confirmed that variable a3 is equal to variable b3 in intermediate instruction s34. If the left side member in intermediate instruction s26 is not variable a3, it is necessary to trace other intermediate instructions shown in the definition-use chain information for variable t32 in intermediate instruction s6.
Thus, considerable analysis time is spent in judging that variable x is equal to variable y in an intermediate instruction "s: x=y" in order to confirm the safety in optimization, since use-definition chain information and definition-use chain information have to be traced step by step.
In FIG. 7, it appears that intermediate instruction s35 can be deleted, since in the preceding blocks variable a4 has the same value as expression x4+y4 as shown in the use-definition chain information and the definition-use chain information in intermediate instructions s16, s20, s27, and s31. However, in the common subexpression optimization described above, intermediate instruction s35 can be changed to a copy but cannot be deleted. Accordingly, optimization different from the common subexpression optimization is required. To confirm that variable a4 is equal to the value of expression x4+y4 in intermediate instruction s35, not only the use-definition chain information and definition-use chain information (as in FIGS. 4-6) but expression x4+y4 needs to be analyzed, so that the analysis time is further prolonged.
The above analysis is necessary to eliminate the dangers associated with optimizing the program. However, since it requires considerable analysis time, such optimization processing is far from practical. Besides, there is no guarantee that the analysis time can be completed within a specified period. Thus, the conventional optimization methods are not effective in optimizing the program over a plurality of basic blocks, as it cannot sufficiently reduce the redundancy between the plurality of basic blocks.