1. Field of the Invention
The present invention relates to analysis of computer programs.
2. Description of the Related Art
Many algorithms for static analysis of imperative programs make the simplifying assumption that the data manipulated by a program consists of simple atomic values, when in reality aggregates such as arrays and records are usually predominant. By "atomic value" (also referred to as an "atom" or "scalar") we mean a value that is treated as an indivisible (single logical) unit by the program. As an example, a program might treat a person's age as an atomic value. An aggregate, by contrast, consists of two or more logical units of information, and the program might refer to or manipulate one of these units of information independent of the other units of information. For example, a program may treat a person's date-of-birth as an aggregate consisting of three logical units of information: the year-of-birth, the month-of-birth, and the day-of-birth. In particular, a statement in the program may make use of the year-of-birth, ignoring the month and day information.
There are several straightforward approaches to adapting analysis algorithms designed for scalars to operate on aggregates:
1. Treat each aggregate as a single scalar value. PA1 2. Decompose each aggregate into a collection of scalars, each of which represents one of the bytes (or bits) comprising the aggregate. PA1 3. Use the declarations (variable and type declarations) in the program to break up each aggregate into a collection of scalars, each of which represents a declared component of the aggregate containing no additional substructures of its own.
Unfortunately, each of these approaches has drawbacks. The first approach can yield very imprecise results. While the second approach is likely to produce precise results, it can be prohibitively expensive. Finally, the third approach appears at first blush to be the obvious solution. However, it is unsatisfactory in weakly-typed languages such as Cobol, where a variable need not be explicitly declared as an aggregate in order for it to contain composite data. Even in more strongly-typed languages, declarative information alone can be insufficient because loopholes in the type system (such as typecasts) may permit aggregate values to interoperate with non-aggregate values; untagged unions also complicate matters. Moreover, the third approach may produce unnecessarily many scalar components when the program only accesses a subset of those components. Finally, in languages where aggregate components may overlap one another in storage inexactly, checks for storage disjointness (which tend to occur in inner loops of analysis algorithms) may prove expensive.