1. Technical Field
The invention generally relates to the field of symbolic analysis of a software program and in particular to advantageous state merging during symbolic analysis of a software program.
2. Background Information
Symbolic program analysis essentially performs forward expression substitution starting from a set of input variables. The resulting formulae are then used to falsify assertions and find bugs or to generate input assignments and generate test cases. One type of symbolic program analysis is symbolic execution. “Symbolic execution” refers to the execution of a software program using symbolic values instead of actual values. Instead of executing a target program with regular concrete inputs (e.g., x=5), symbolic execution executes a target program with “symbolic” inputs that can take on all values allowed by the type (e.g., x=λ, where λεN, and N is the set of all integer numbers). Whenever a conditional branch is encountered that involves a predicate π that depends (directly or indirectly) on x, both program state and execution are forked into two alternatives: one following the then-branch (π) and another following the else-branch (π). The two executions can now be pursued independently. To ensure that only feasible paths are explored, a symbolic analysis engine (SAE) uses a constraint solver to cheek the satisfiability of each branch's predicate, and the SAE follows only satisfiable branches. This symbolic approach is efficient because it analyzes code for entire classes of inputs rather than specific (“concrete”) inputs. Symbolic execution has been used to build automated test case generation tools and automated bug finding tools. Test generation by symbolic execution is just one of a multitude of precise symbolic program analyses that are facilitated by satisfiability constraint solvers.
A target software program is analyzed symbolically by an SAE. One of the challenges faced by conventional SAEs is scalability. The phenomenon of “path explosion” refers to the fact that the number of possible paths through a target program is roughly exponential in program size. A “state” in symbolic analysis encodes the history of branch decisions (the “path condition”) and precisely characterizes the value of each variable in terms of input values (the “symbolic store”), so path explosion becomes synonymous with state explosion. The benefit of not having false positives in bug finding (save for over-approximate environment assumptions) comes at the cost of having to analyze an exponential number of states.
Given a target program, one way to reduce the number of states that a SAE needs to explore is to merge states that correspond to different paths through the target program. State merging effectively decreases the number of paths that have to be explored but also increases the size of the symbolic expressions describing variables. Merging introduces disjunctions into the path condition, which are notoriously difficult for constraint solvers. Merging also converts differing concrete values into a symbolic expression. If that symbolic expression were to appear in branch conditions or array indices later in the analysis, the choice of merging the states may lead to more constraint solver invocations than without merging. This combination of larger symbolic expressions (and larger symbolic path conditions) and extra solver invocations can outweigh the benefit of having fewer states to analyze.
State merging also conflicts with optimizations in the symbolic analysis exploration process. Search-based SAEs, like the ones used in test case generators and bug finding tools, employ search strategies to prioritize searching of “interesting” paths over “less interesting” ones (e.g., with respect to maximizing line coverage given a fixed time budget). To maximize the opportunities for state merging, however, the SAE would have to traverse the control flow graph in topological order, which typically contradicts the search strategy's path prioritization policy.
The net effect is that state merging may actually be detrimental (e.g., by decreasing overall symbolic analysis performance) rather than advantageous (e.g., by increasing overall symbolic analysis performance).