There are two general techniques for debugging software programs. Dynamic debugging methods form a set of test-cases and the expected result(s) for each test-case. The program is then executed on the set of test cases and the result(s) of the execution are compared with the expected result(s). A mismatch is a symptom of an error in the program. Static debugging methods, on the other hand, form a set of properties that the program should satisfy. For example, a static debugging technique may require that a program should not crash; should satisfy given rules of accessing data; and should have outputs with a given relation to its inputs.
Static methods analyze the input source code without executing it. They search for a path violating one of the properties that is to be reported as an error. In this search, static methods tradeoff efficiency for accuracy. A key issue is the determination of whether the path is feasible, i.e., are there input values that would cause the path to be executed. In general, static debugging techniques excel at discovering rare bugs whereas dynamic debugging techniques excel at finding common bugs and testing multiple modules. Thus, the two test methods are complementary.
“Lint” software testing and debugging tools place a high degree of importance on efficiency and do not determine the feasibility of paths. Commercial implementations of Lint tools include Parasoft, Flexlint and Reasoning. Lint tools do not try to avoid “false errors.” “Formal verifiers,” on the other hand, are software debugging tools that determine feasibility. For that purpose, formal verifiers collect the constraints for a path to be feasible, and pass those constraints to a constraint solver. If the constraint solver determines the constraints to be consistent, then an error can be reported.
Static analysis tools parse the source programs to produce a parse tree. A parse tree is a representation of the structure of the given input source programs. Parsing is performed using standard compiler techniques. In addition, static analysis tools perform semantic analysis to produce a flow graph from the given parse tree, using standard compiler techniques (where in place of emitting code, flow-graph nodes are generated). The nodes represent data flow operations, such as “+,” as well as control flow operations, such as variable assignments. There are also nodes representing conditional branching that record the condition(s) of the test. Thereafter, an analysis of the flow graph is performed. The actual form of flow graph analysis differs for different tools, but in general involves traversing the flow graph and doing some operations for each node traversed. Tools that determine feasibility of paths have to take into account the nodes representing conditional branches. From these conditional branch nodes, the tools collect the constraints for following each path. These constraints involve operations and predicates from various domains: arithmetic, pointers, arrays, and other data structures.
The constraint solvers need to understand these domains, and they use several approaches for that purpose. For example, arithmetic is in general undecidable, but there is a decidable subset, referred to as Presburger arithmetic, that is adequate for the purposes of software analysis. For a detailed discussion of Presburger arithmetic, see, for example, Presburger, On the Completeness of a Certain System of Arithmetic of Whole Numbers in Which Addition Occurs as the only Operation, Hist. Philos. Logic, 12(2):225–233, 1991, Translated from German and with commentaries by Dale Jacquette, incorporated by reference herein. However, as the decision procedure for Presburger arithmetic has a super exponential lower bound, Presburger arithmetic is too expensive for the purposes of software analysis. Therefore, only subsets of Presburger arithmetic are being used.
Solvers employing Presburger arithmetic, or derivatives thereof, such as linear integer programing, however, are inefficient. Such solvers are complete even for types of constraints unnecessary in software analysis, making them less efficient. At the same time, such solvers are inflexible, i.e., it is not possible to add operators outside of their theory. Another general approach to constraint solving relies on rewrite rules. For a detailed discussion of rewrite rules, see, for example, N. Dershowitz & J. P. Jouannaud, Rewrite Systems, Handbook of Theoretical Computer Science, Volume B, Chapter 15, North-Holland, 1989, incorporated by reference herein. Generally, rewrite rules modify the constraints (or the flow graph) in order to arrive at an answer. While solvers employing rewrite rules express the semantics well, they are inefficient with arithmetic constraints.
The static techniques (referred to as lint above) that do not evaluate the feasibility of paths tend to issue too many complaints that, in fact, do not represent any error in the program. As a result, programmers tend to ignore all complaints issued by such tools. Formal verifiers check a given implementation against a user-supplied specification. Verifiers spend more time than other source code analysis tools, achieving the highest degree of accuracy. However, there is still uncertainty. First, the verification tool may not know which input combinations are considered legal and, secondly, the problem may be too large for the verifier to handle. Both of these kinds of uncertainties are resolved by placing the burden of proof on the user. Specifically, an error is reported if the user-provided information does not allow the verifier to prove the absence of error.
Static techniques that evaluate the feasibility of paths rely on a constraint solver. A constraint solver should be efficient; sound (i.e., what percentage of constraints declared inconsistent are indeed inconsistent); complete (i.e., what percentage of constraints declared consistent are indeed consistent); and flexible (i.e., how easy is it to extend the solver). As it is impossible to satisfy all four properties perfectly, traded-offs must be made. The main tradeoff is between efficiency and completeness. Ideally, a solver should be only as complete as required by the application of software analysis; being less complete would result in incorrect error reports, being more complete would result in reduced efficiency (although more program errors would be discovered).
A constraint solver is needed that remembers former constraints and adds new constraints incrementally. The solver should be efficient, flexible and capable of satisfactorily expressing semantics and handling arithmetic constraints.