Various computer program analysis tools such as compilers, style checkers, static bug detectors, and restructuring tools for example, typically perform static program analyses to better optimize, understand, or browse computer programs. Such analysis tools perform program-point-specific dataflow analyses, for example, to approximate the expected run-time behavior of a program. A typical compiler, for example, may use dataflow analyses to help optimize the run-time execution of compiled programs. Examples of dataflow analyses include interprocedural constant propagation and points-to analysis.
Dataflow analyses generate a model of every program quantity of interest, such as each variable, expression, or storage location for example, at every program point, such as each expression, control flow graph node, or program counter value for example. Typical dataflow analyses are monolithic and simultaneously model all relevant program quantities at all relevant program points. The attendant costs of memory space and execution time for such analyses are proportional to the following factors:
(1) the cost of modeling a single quantity at a single point, PA1 (2) the number of quantities modeled, and PA1 (3) the number of points at which each quantity is modeled.
As factor (1) increases for more sophisticated analyses and as factors (2) and (3) increase for analyses on large or entire programs, dataflow analysis costs can grow relatively large.
Sparse representation methods help reduce the costs of performing dataflow analyses by reducing factor (3). Such methods model each program quantity of interest at only those program points where the value of the quantity's model might differ from the value at the points' predecessor. Because most program points affect only a small subset of the program quantities, oftentimes only one program quantity, cost reductions may be significant.
Other methods to reduce the costs of performing dataflow analyses target factor (2) or both factors (2) and (3) by partitioning the analysis into phases, each of which models only a subset of the program quantities and/or points.
Partitioning the dataflow analysis into phases helps reduce memory space costs as some dataflow analyses or optimizations may be performed on a per-phase basis. The storage used by the analysis for each phase may therefore be reclaimed for use by subsequent phases. As one example, an assignment to a dead variable can be removed irrespective of the liveness of any other variable or any other assignment statement. Similarly, primitive operations having operands that are constants can be folded without knowledge of the constancy of other program quantities. The working memory requirements for the dataflow analysis are therefore reduced to those of the most expensive phase.
Because of, for example, the need to model quantities at meet points, auxiliary data structures such as dependence graphs used by the analysis, and incomplete usage of the solution of the dataflow analysis, each phase's solution typically requires less storage than computing it. Partitioning therefore also allows excess intermediate storage to be reclaimed following the performance of each phase.
Partitioning the dataflow analysis may also reduce the execution time costs if individual phases may be performed by more efficient dataflow analyses and/or if more than one individual phase may be performed simultaneously in parallel.
Point-based partitioning schemes use control flow relationships between program points to model only a subset of the points in each phase. Examples of point-based schemes include interval-based dataflow analyses and interprocedural analyses that separate intraprocedural analysis from interprocedural propagation.
Quantity-based partitioning schemes analyze all program points yet model only a subset of the program quantities for each phase. Existing quantity-based schemes, however, are restricted to separable dataflow analyses, such as reaching definition analyses and live variable analyses for example, where the dataflow solution for each program quantity is independent of those for all other quantities. Some dataflow analyses, such as constant propagation and points-to analysis for example, are not separable because some of the program quantities may interact during the analysis. Such non-separable dataflow analyses may not be partitioned using existing quantity-based partitioning schemes and therefore model all relevant program quantities simultaneously to account for the possible interaction of program quantities.