Software systems can include thousands or even millions of lines of computer program text. Not surprisingly, interactions between different parts of the computer program text can be complex and difficult to follow.
Static analysis involves automatic reasoning about computer programs from the text of the computer programs. Static analysis has applications in compiler optimizations and computer software verification, among other things. A compiler typically converts program text into instructions executable on a computer processor. Using static analysis of program text, a compiler can at times identify-problems such as run-time errors in the computer program without even running the program. Or, the compiler may be able to improve the efficiency of the output instructions. Software verification more broadly refers to testing or otherwise evaluating software to verify that the software behaves as expected or has certain desirable properties, or to verify the correctness of the software versus predefined criteria.
One common task of an analysis tool is to infer invariants and other properties of a computer program. An invariant is a condition that always holds. For example, a condition that always holds at the beginning of a loop is a loop invariant, and a condition that always holds for an object is an object invariant. If a developer is the one to indicate invariants or other properties (e.g., by annotating program text or a behavioral specification to signal intended invariants or properties), the process can be time-consuming for the developer. The development process is simplified if a tool can automatically infer invariants and other properties of a computer program.
Abstract interpretation is a form of static analysis that allows an analysis tool to automatically infer invariants and other properties. With abstract interpretation, over-approximations of sets of reachable program states are systematically computed. The over-approximations are conventionally represented as elements of a lattice for an abstract domain. Elements of the abstract domain can be viewed as constraints on a set of variables, such as the variables of the program.
Suppose a program includes the variables x, y, and z as well as statements setting values for the variables and comparing variables. The polyhedra abstract domain can represent linear-arithmetic constraints like x=5, 6<y≦11, x<y, or x+y≦z for the program. This allows the abstract domain to track if it is possible for a constraint to evaluate to true and if it is possible for the constraint to evaluate to false.
Or, suppose a computer program includes the simple loop:
x := 0while (x < 10) {   x := x + 1}
Using abstract interpretation and an abstract domain that tracks interval relationships for variables, an analysis tool may infer and confirm that x=10 at the end of the loop. It may also infer and confirm the range of x at different stages. For example, before x is set to 0, the tool infers that −∞<x<∞. After the assignment x:=0 but before the loop starts, the tool infers that x=0. In the body of the loop in the first iteration, the tool infers that x=0 and x<10 before the increment statement, then also infers x=1 after the increment statement. At this point, the tool infers that the range of x is 0 to 1. After subsequent iterations, the tool infers that the range of x is 0 to 2, 0 to 3, etc., up to a range of 0 to 10 when x=10. Incidentally, if there were no upper bound to the loop (e.g., if the conditional statement was x>−1), the analysis could continue indefinitely until the tool stopped it. The tool might loosen constraints in the analysis if the range fails to stabilize, however, then infer that the range of x is 0 to ∞.
Different abstract domains might specialize in Boolean logic, or state machine analysis, or system resource (e.g., mutex) access patterns. Developing specialized abstract domains for different areas can be time consuming.
In addition to standard, well-known functions and relation symbols, a computer program may include functions and relation symbols that are customized to the program, to the language of the program, or to the general area of the program. As a result, in abstract interpretation, constraints of interest often involve functions and relation symbols not all supported by any single abstract domain. For example, some computer programs include functions for interacting with “heap” memory. (In general, the heap is an area of computer memory used for dynamic memory allocation, where blocks of memory are allocated and freed in an arbitrary order, and the pattern of allocation and size of blocks may not be known until run-time.) Suppose a constraint of interest in the analysis of a Java or C# program is:sel(H,o,x)+k≦length(a),where H denotes the current heap, sel(H,o,x) represents the value of the field x of an object o in the heap H (usually written o.x in Java and C#), and length(a) gives the length of an array a. This constraint cannot be represented directly in the polyhedra domain because the polyhedra domain does not support the functions sel and length. Consequently, the polyhedra domain would very coarsely over-approximate this constraint with a lattice element that conveys no information.
This example illustrates some problems with current abstract domains. If a constraint mentions a function or relation symbol that is not supported by an abstract domain, the constraint is ignored by the abstract domain (that is, it is very coarsely over-approximated). Moreover, current abstract domains do not support certain functions and relation symbols for heap management.