1. Field of the Invention
The present invention relates to alias analyzers and more particularly to safe and efficient alias analyzers for programs written in programming languages that use pointers.
2. Description of the Prior Art
Computer languages require a mechanism to name memory locations. For example, variable declarations associate the variable identifier with a memory location. In most modern computer languages, multiple names can be assigned to the same memory location; such names are called aliases. Languages like C, C++, Fortran 90 and Ada have special variables called pointers. A pointer is a variable which, instead of storing data, stores the addresses of another memory location.
There are two important unary (one-input) operators associated with pointers. The first is the address operator (&) which given a variable yields the memory location of that variable. The second is the dereference operator (*), The meaning of dereferencing variable x (e.g., *x ) depends on whether *x is being written (e.g., the left-hand-side of an assignment statement) or read (e.g., the right-hand-side of an assignment). In the former case, *x is the location stored in x, and in the latter case *x yields the value stored in the location stored in x.
For example, consider:
p=&v; PA0 v=5; PA0 *p=5; PA0 z=v; PA0 z=*p; PA0 Wei80}--W. E. Weihl, Interprocedural Data Flow Analysis in the Presence of Pointers, Procedure Variables and Label Variables, Master's thesis, M.I.T., June 1980. PA0 Cou86!--D. S. Coutant, Retargetable High-level Alias Analysis, In Conference Record of the Thirteenth Annual ACM Symposium on Principles of Programming Languages, pages 110-118, January 1986. PA0 Deu92!--A. Deutsch, A Storeless Model Of Aliasing And Its Abstractions Using Finite Representations Of Right-regular Equivalence Relations, In Proceedings of the IEEE 1992 Conference on Computer Languages, pages 2-13, April 1992. PA0 LR92!--W. Landi and B. G. Ryder, A Safe Approximation Algorithm For Interprocedural Pointer Aliasing, In Proceedings of the SIGPLAN '92 Conference on Programming Language Design and Implementation, pages 235-248 June 1992 PA0 CBC93!--Jong-Deok Choi, Michael Burke, and Paul Carini, Efficient Flow-sensitive Interprocedural Computation Of Pointer-induced Aliases And Side Effects, In Conference Record of the Twentieth Annual ACM Symposium on Principles of Programming Languages, pages 232-245, January 1993. PA0 BCCH94!--Michael Burke, Paul Carini, J-D. Choi, and M. Hind, Flow-insensitive Interprocedural Alias Analysis In The Presence Of Pointers, In Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing, pages 234-250. Springer-Verlag, August 1994. PA0 DEU94!--A. Deutsch, Interprocedural May-alias Analysis For Pointers: Beyond K-limiting, In Proceedings of the SIGPLAN '94 Conference on Programming Language Design and Implementation, pages 230-241, 1994. PA0 EGH94!--M. Emami, R. Ghiya, and L. J. Hendren, Context-Sensitive Interprocedural Points-to Analysis In The Presence Of Function Pointers, In Proceedings of the SIGPLAN '94 Conference on Programming Language Design and Implementation, pages 242-257, June 1994, Published as SIGPLAN Notices, 29(6) PA0 Ruf95a!--E. Ruf, Context-insensitive Alias Analysis Reconsidered, In Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation, pages 13-22, June 1995. PA0 WL95!--R. Wilson and M. Lam, Efficient Context-sensitive Pointer Analysis For C Programs, In Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation, pages 1-12, June 1995. PA0 Ste96!--B. Steensgaard, Points-to Analysis in Almost Linear Time, In Conference Record of the 23rd Annual ACM Symposium on Principles of Programming Languages, pages 32-41, January 1996.
______________________________________ Symbol Table variable memory location ______________________________________ p 1000 v 1050 z 1100 ______________________________________
The assignment p=&v puts 1050 into location 1000 and the name *p is aliased to v. At this point, the names *p and v both refer to location 1050; thus using *p is identical to using v. Thus, v=5 and *p=5 both store 5 in location 1050, and both z=v and z=*p write 5 (the value stored at location 1050) into location 1100 (named z). Variables are different than names like *p, as a variable names the same location for the entire execution of the program but *p max name different locations at different points during the execution: Often a variable is used as the location which the variable names (e.g., location v is used to mean location 1050).
It is also possible to associate names with locations that do not exist when the program is compiled. This is done through dynamic memory allocation. For example, the C statement p=malloc(sizeof(int)) assigns to p some location to which integers can be written. Throughout this description, C syntax; C and pointers are used which are described in C: A Reference Manual by Harbison, S. P. and Steele, G. L. Jr., Prentice Hall, 1984.
The present state of the art in alias analyzers for programs using pointers is a plethora of techniques that work nicely on small programs but can not currently handle realistic size programs. These techniques fall into three general classes: techniques which find an alias solution for each statement, techniques which find an alias solution for each subroutine, and techniques which find one alias solution for the entire program. All of these techniques use pairs of "names" to represent the alias solution. However, in some cases these pairs are alias pairs (e.g., &lt;a,b&gt; represents that a and b are aliased) and in others they are points-to pairs (e.g., &lt;c,d&gt; represents that c points to d and thus *c and d are aliased). In some cases, points-to pairs are expressed as a function that maps a name to the set of names it can point-to. The exact definition of "names" varies from method to method as well as which names are considered by the analyzer. Finally, existing techniques vary significantly in how they account for interprocedural realizable paths. That is, they vary on how they transfer aliases between calling and called subroutines.
The extant techniques of the prior art are summarized in Table 1 below:
______________________________________ published empirical results one alias alias or largest solution points-to program time to technique per pairs size analyze ______________________________________ Wei80! program alias 3,315 18s.sup.a Cou86! program points-to unspecified Deu92! statement alias no implementation reported LR92! statement alias 6,792 44s CBC93, statement points-to no implementation BCCH94! statement reported Deu94! statement alias no implementation reported EGH94! statement points-to 2,279 no timings Ruf95! statement points-to 6,771 no timings WL95! statement points-to 4,663 16s Ste96! program points-to 75,000 16s ______________________________________ .sup.a) Wei80! does not contain any empircal results. However, LR92! ha empirical results for Weihl's analyzer.