Points-to analysis is a static analysis that models dynamic memory behavior by computing the points-to set for each pointer variable (i.e., the set of memory objects (storage locations) that a pointer variable can point to). Pointer information is a prerequisite for most program analyses for C-like languages, including compiler optimizations, program comprehension, and error checking. The precision and performance of the client analyses depend heavily on the precision of the pointer information provided.
Typically, it is sufficient for pointer analyses to analyze only the four types of instructions shown below in TABLE 1. A Base instruction sets the value stored in a pointer variable to the address of a memory object (e.g., pointer=&object). An Assignment instruction sets the value stored in a pointer variable to the value stored in another pointer variable (e.g., the statement pointer1=pointer2 where both pointer1 and pointer2 are pointer variables). A Store instruction is a statement that stores the value of a pointer variable to the memory address of a de-referenced pointer variable (e.g., *pointer1=pointer2), and a Load instruction is a statement that loads the memory address of a pointer variable into another pointer variable (e.g., pointer1=*pointer2).
TABLE 1InstructionTypea = &bBasea = bAssignment*x = bIndirecta = *y(Load andStore)
Nested pointer dereferences may be eliminated by introducing auxiliary variables. Data aggregates, such as arrays and structs, are regarded as monolithic objects, where heap objects may be modeled by regarding the allocation site as a special memory object. Function calls and returns may be translated to a set of Assign instructions between function arguments (returns) and parameters. Without loss of generality, we assume that if a memory object A is accessed via Load and Store instructions only: a unique pointer variable pA may be introduced to take its address (via Base instruction pA=&A) and Memory Object A may be accessed via Load and Store instructions with pA as the target (i.e., A=pointer1 is translated into pA=&A, *pA=pointer1).
Most existing points-to analyses are based on Andersen's formulation, which does not consider flow-sensitivity (i.e., respecting control flow dependencies) and context-sensitivity (i.e., respecting the semantics of function calls). Andersen's formulation computes the points-to set of all pointer variables by solving a set of inclusion constraints generated from program code. In practice, the constraints are effectively solved by computing a dynamic transitive closure of a constraint graph, with nodes to represent pointer variables and memory objects, and edges to represent inclusion constraints between the pointer variables and memory objects. Indirect references are more complex and are handled by processing the points-to set for each node in the graph, where the points-to set is gathered by computing the transitive closure of the graph. As more points-to information is computed, new edges are introduced to the constraint graph to represent constraints introduced via indirect references; thus, the transitive closure and the points-to information should be updated. In this case, the algorithm terminates when no new points-to information can be updated.