The present invention relates to the field of computer software engineering. More specifically, it relates to a method and apparatus for compiling source code by exposing parallelism, thereby enabling the generation of more efficient compiled code, allowing more parallel instructions to be created and leading to better utilization of processor resources.
Compiling a computer program involves converting a high level language code (source code) into instructions which are intelligible to the processor of a computer (object code). One of the goals of a compiler is to generate an efficient implementation of the source code. In the case of processors which can execute more than one instruction in parallel, it is an objective of the compiler to compile the code such that as many instructions as possible can be executed in parallel.
It is an object of the invention to provide a method and apparatus for efficiently compiling code by exposing parallelism in that code. The present invention is applicable to any computer architecture having multiple functional units, for example, Very Long Instruction Word (xe2x80x9cVLIWxe2x80x9d) architectures for implementing real-time implementations of Digital Signal Processing (xe2x80x9cDSPxe2x80x9d) applications.
It is a further and more specific object of the invention to determine and represent dependencies for each of the memory accesses in a source program. This exposes the execution paths in the program so that the compiler can exploit the parallelism to the maximum extent. In the case of VLIW architectures, the failure to detect parallelism accurately can lead to fewer instructions being packed into a VLIW instruction word and hence the processor performing below its capabilities.
Index expressions present special problems for compilers. Consider the following piece of pseudo-code:
The block code starting with xe2x80x9cdouble a=A[0]xe2x80x9d can be executed in parallel with the code starting with xe2x80x9cdouble d=A[2]xe2x80x9d since they do not access the same memory locations in array A.
Now consider the following example of more complex parallelism:
The indices for array A are now linear index expressions of the loop iterators (the linear expressions are i, i+1 and loop iterator i). The term xe2x80x9clinearxe2x80x9d as applied to an index expression means that the index expression is the result of a combination of a constant times the iterator plus a constant. This information can be used to determine if multiple iterations of a loop can be run in parallelxe2x80x94referred to as xe2x80x9carray privatization.xe2x80x9d In this example, the compiler can tell that the array value is produced in the previous iteration. It therefore cannot postpone the write operation until after the read of the next iteration.
Now consider the following example involving an induction variable (which is behaviorally equivalent to the previous example):
In this expression, a variable is initialized before a loop. The value used and updated inside the loop is called an induction variable. In some cases the induction variable can be converted into a linear index expression and the parallelism in the code thus exposed. This is the case here because the code is completely equivalent to that of the previous example.
Known methods of compiler optimization only consider linear index expressionsxe2x80x94assuming a worst case of non-parallelism for non-linear expressions. Induction variables are handled by transformation into linear index expressions. Most systems of the prior art use heuristic approaches to obtain memory access information from such expressions. Prior art algorithms formulate the linear index expressions and a specific data flow analysis question as the proof of the existence of a solution of an Integer Linear Programming problem. Another known method for simplifying the Integer Linear Programming problem is Fourier-Motzkin variable elimination, which reduces the number of dimensions of the problem.
The present invention does not rely on linear index expressions, nor on Integer Linear Programming. Instead it transforms the relevant pieces of the source program into a data structure suitable for symbolic execution, performs the symbolic execution and extracts, from the data generated during the symbolic execution, the relevant combinations of index values. That is to say, the correct range that a particular index expression can take is computed by the symbolic execution and not defined through Integer Linear Programming.
The present invention thus has, among other benefits, the ability to handle the optimization of code involving non-linear index expressions as well as lookup tables and conditional updates.
These and other advantages of the invention will be apparent from the following description and claims.
The present invention relates to a method of compiling a computer program, the program comprising a plurality of operations having a sequence. The method involves extracting from the computer program, information describing the operations and the sequence of the operations and storing the extracted information as a data structure. The operations in the computer program which involve index expressions are identified and those operations are executed, producing information describing memory accesses. The operations which can be executed in parallel are identified, based on the information describing memory accesses.
In a further embodiment, the method includes a step of generating a symbolic execution data structure comprising a representation of the operations in the computer program which involve the memory accesses and index expressions. The symbolic execution data structure may be a data flow problem graph. In yet a further embodiment, the method of the present invention involves executing the operations by executing the symbolic execution data structure and noting memory locations addressed by the computer program. A question data structure is generated containing questions relating to how the computer program accesses memory. The computer program is analyzed by interrogating the computer program with the questions. The answers to the questions are back annotated into the question data structure. In a further embodiment, index sets relating to memory access by the operations of the program which involve index expressions are generated. The answers to the questions are computed based on information accumulated in index sets during the symbolic execution step. The index sets are thus filled in the symbolic execution step.
The invention further encompasses an apparatus for compiling computer code, having first signal flow analysis module which creates a signal flow data structure for index expressions used by the computer code. The signal flow analysis module comprises a module which identifies an index path in the computer code. The index path is made up of operations involved in computing indices used in memory accesses by index expressions. A symbolic execution module executes the index path, thereby extracting information relating to index expression memory accesses.
In a further embodiment, the apparatus includes a question database containing questions relating to memory accesses by index expressions. A module generates index sets made up of responses to questions in the question database. A module generates answers to the questions in the question database based upon the contents of the index sets.