The present invention relates to the field of computer software engineering. More specifically, it relates to a method and apparatus for compiling source code by exposing parallelism, thereby enabling the generation of more efficient compiled code, allowing more parallel instructions to be created and leading to better utilization of processor resources.
Compiling a computer program involves converting a high level language code (source code) into instructions which are intelligible to the processor of a computer (object code). One of the goals of a compiler is to generate an efficient implementation of the source code. In the case of processors which can execute more than one instruction in parallel, it is an objective of the compiler to compile the code such that as many instructions as possible can be executed in parallel.
It is an object of the invention to provide a method and apparatus for efficiently compiling code by exposing parallelism in that code. The present invention is applicable to any computer architecture having multiple functional units, for example, Very Long Instruction Word (xe2x80x9cVLIWxe2x80x9d) architectures for implementing real-time implementations of Digital Signal Processing (xe2x80x9cDSPxe2x80x9d) applications.
It is a further and more specific object of the invention to determine and represent dependencies for each of the memory accesses in a source program. This exposes the execution paths in the program so that the compiler can exploit the parallelism to the maximum extent. In the case of VLIW architectures, the failure to detect parallelism accurately can lead to fewer instructions being packed into a VLIW instruction word and hence the processor performing below its capabilities.
Index expressions present special problems for compilers. Consider the following piece of pseudo-code:
1 void f(
2 double A[4]
3 )
4 {
5 double a=A[0];
6 double b=A[1];
7 double c=a+b;
8 A[0]=c;
9
10 double d=A[2];
11 double e=A[3]
12 double f=dxe2x88x92e;
13 A[2]=f;
14 }
The block code starting with xe2x80x9cdouble a=A[0]xe2x80x9d can be executed in parallel with the code starting with xe2x80x9cdouble d=A[2]xe2x80x9d since they do not access the same memory locations in array A.
Now consider the following example of more complex parallelism:
1 double f(
2 double in
3 )
4 {
5 double A[16];
6 A[0]=in;
7 for (int i=0; i less than 16; i++) {
8 A[i+1]=A[i]*2.0
9 }
10 return A[15];
}
The indices for array A are now linear index expressions of the loop iterators (the linear expressions are i, i+1 and loop iterator i). The term xe2x80x9clinearxe2x80x9d as applied to an index expression means that the index expression is the result of a combination of a constant times the iterator plus a constant. This information can be used to determine if multiple iterations of a loop can be run in parallelxe2x80x94referred to as xe2x80x9carray privatization.xe2x80x9d In this example, the compiler can tell that the array value is produced in the previous iteration. It therefore cannot postpone the write operation until after the read of the next iteration.
Now consider the following example involving an induction variable (which is behaviorally equivalent to the previous example):
1 double f(
2 double in
3 )
4 {
5 double A[16];
6 A[0]=in;
7 int idx1=1;
8 int idx2=0;
9 for (int i=0; i less than 16; i++) {
10 A[idx1]=A[idx2]*2.0;
11 idx1=idx1+1;
12 idx2=idx2+1;
13 }
14 return A[15];
15
In this expression, a variable is initialized before a loop. The value used and updated inside the loop is called an induction variable. In some cases the induction variable can be converted into a linear index expression and the parallelism in the code thus exposed. This is the case here because the code is completely equivalent to that of the previous example.
Known methods of compiler optimization only consider linear index expressionsxe2x80x94assuming a worst case of non-parallelism for non-linear expressions. Induction variables are handled by transformation into linear index expressions. Most systems of the prior art use heuristic approaches to obtain memory access information from such expressions. Prior art algorithms formulate the linear index expressions and a specific data flow analysis question as the proof of the existence of a solution of an Integer Linear Programming problem. Another known method for simplifying the Integer Linear Programming problem is Fourier-Motzkin variable elimination, which reduces the number of dimensions of the problem.
One benefit of the present invention is the ability to perform inter-procedural data flow analysis over multiple function calls. In the prior art, this was done by combining the results of intra- and inter-procedural data flow analysis in a post-processing step. In contrast, the present invention performs operations which flatten the hierarchy of function calls in order to determine data dependencies for memory accesses. Here, the flattening operation is performed prior to the analysis stage.
These and other advantages of the invention will be apparent from the following description and claims.
The present invention relates to a method of optimizing the compilation of a computer program. The method includes the steps of identifying an index path by noting the operations of the program which involve one or more index expressions. A non-hierarchical representation of the index path is created. The memory accesses in the non-hierarchical representation of the index path are examined to identify program steps that could be executed in parallel.
Where the program includes function calls, the step of identifying an index path involves noting operations in the function calls, and the non-hierarchical representation of the index path includes information relating to operations in the function calls.
In a further embodiment of the invention, the method includes the steps of merging the information relating to operations in the function calls back into a hierarchical representation of the index path.
In yet a further embodiment, a data structure of the non-hierarchical representation is created and interrogated with questions relating to memory accesses. The results are stored in or back annotated to a question data structure.
The invention also encompasses an apparatus for compiling computer code including a module which creates a signal flow data structure for index expressions used by the computer code. That module includes a means for identifying steps in the index path of the computer code, identifying parts of memory accessed by index expressions. A module removes the hierarchy from the computer code, creating a non-hierarchical representation of the computer code.