1. Field of the Invention
The present invention generally relates to compiler optimizations, and in particular to a technique for reducing redundant pointer information.
2. Background Description
Compilation is a process used for translating high level language statements in a computer program into machine instructions executable by the target computer. Optimization is a general term for modifications applied to source code, object code or any intermediate code present during compilation, in order to improve the efficiency of the program being compiled. Usually optimization either:
a) aims to overcome pessimistic assumptions implied by language rules that can cause redundancies in the compiled code. This can result in unnecessary memory allocation for the additional (that is, redundant) code; or PA1 b) exploits a particular hardware for different environments. The present invention is directed to an optimization technique of the first type. PA1 1) "Efficient context-sensitive pointer analysis for C programs", Robert Wilson and Monica Lam, SIGPLAN '95; PA1 2) "Points-to analysis in almost linear time", Bjarne Steensgaard, Technical Report MSR-TR-95-08, Microsoft Corporation; PA1 3) "A safe approximate algorithm for interprocedural pointer aliasing", William Landi and Barbara Ryder, SIGPLAN '92; PA1 4) "Almost linear time points-to analysis", William Landi, in POPL '95; PA1 5) "Context-sensitive interprocedural points-to analysis in the presence of function pointers", Maryam Emami, Rakesh Ghiya and Laurie Hendren, SIGPLAN '94; PA1 6) "Interprocedural may-alias analysis for pointers: Beyond k-limiting", Alain Deutsch International Conference on Computer languages, IEEE '92; and PA1 7) "Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects", Jong-Deok choi, Michael Burke and Paul Carini, SIGPLAN '93.
Compilation may, broadly speaking, be divided into a front end phase in which the source code program is translated, through steps of lexical, syntactic and semantic analysis, into an intermediate representation, and a back end phase in which the intermediate representation is translated for output in object code modules, called compilation units, for linking into executable files. Optimization may take place at various stages during this compilation process, but generally speaking, occurs in the back end.
Usually optimization algorithms work on a representation of the aspect of the program to be optimised. This representation is usually produced by a pre-compilation step or by the first pass of a two-pass compiler. For example, U.S. Pat. No. 5,107,418 for "Method for Representing Scalar Data Dependencies for an Optimizing Compiler" describes a method for creating a local scalar data dependence graph for each basic block of the program. This local analysis is used to form a global data dependence graph that shows data dependencies in the context of a control flow graph within a single function that can be used for later optimizations in the compiler.
Most modern programming languages offer the capability to access a data object, or function object in the case of object oriented programming languages like C++, indirectly through the use of a pointer variable. A pointer is a reference to the location or address of some region in memory where the data or function is stored. Typically, the value of the pointer variable is the object's address in storage. Because the reference to the object is indirect, through the pointer, calls that use function pointers are referred to as indirect calls.
Pointer references are particularly useful in complex programs where the exact number of elements in different types of data structures may not be ascertainable at compilation time. The number may vary with the program's actions as it is running. The use of pointer references allows individual pieces of storage to be allocated as needed, so that the required amount of storage is available at any given moment during program execution.
When used, the pointer variable is first initialised to the address of the specific object, and then de-referenced in order to access the object. Some languages permit the user to modify or copy pointer variables for the purpose of traversing an aggregate object, such as an array. Pointers may also be modified or copied for dynamically selecting an object to be operated upon.
These types of manipulations of pointer variables can lead to situations where the compiler cannot precisely determine the set of objects (data or function objects) pointed to by a pointer variable at a specific location in the program. In these situations, the compiler must use safe assumptions to determine the scope of the pointer's object set. These assumptions, called aliasing assumptions, are usually specified by the language being compiled. Aliasing assumptions are often very pessimistic. They drastically reduce the level of optimization that can be performed by reducing the level of redundancies which can be eliminated.
This problem can be illustrated using the following simple code example in which the use of *p and *q indicate de-referencing of the pointers p and q, respectively:
s1: p=&a; s2: q=&b; . . . . . . . . . . . . s5: . . =*p+2; . . . . . . s7: *q=. . . . . . . . . s9: . . =*p+2;
The compiler can trace the values of p and q, and determine that they are handles to different objects by keeping track of the set of objects associated with a pointer variable. This set of objects is the alias set associated with the pointer variable.
However, the compiler cannot normally tell if statement s7 invalidates the value of the expression in statement s5, and as a result, it cannot assume that the same expression found in statement s9 is redundant. It must generate code which will recompute the value of the expression *p+2 for statement s9.
While the example illustrates only a single redundancy, the magnitude of the problem can be realised over a large program of thousands of lines of code.
One way to reduce the size of the alias sets is to provide the user with alias assertion options giving direct control over the aliasing assumptions. However, correct use of these options requires great skill and time on the part of the user, and increased program complexity makes it much more difficult to generalise appropriate aliasing assumptions.
The preferred approach is to develop an automatic solution to the problem, and to that end, a number of techniques have been developed for computing the approximate set of objects that a pointer can point to at any specific point in the program, such as the following:
Many of the existing techniques are computationally expensive and don't provide a solution for programs that contain indirect calls.