The invention relates to systems for program verification and automated theorem proving, and more particularly, to a static analysis technique for detecting errors in a software program.
In the field of computer programming, it is often advantageous to check a software program statically, i.e., at compile-time, link-time or bind-time rather than at run-time. This enables a programmer to detect and correct certain types of errors prior to run-time, additionally saving the processing time and the consequences of running an erroneous program.
One of the techniques for static verification of a software program begins with a formula called a xe2x80x9cverification conditionxe2x80x9d being derived from the program by a series of steps beginning typically with the parsing of the original program source code. A collection of terms occurring in the verification condition, optionally combined with a selected subset of terms occurring in an axiom set and terms derived from the verification condition or the axiom set is then represented by a particular kind of a directed acyclic graph (DAG) called an expression graph or E-graph. Such a static verification process ends with the analysis of the expressions in the E-graph using a set of rules. The rules define the ambit of the verification process and serve as the touchstone for measuring the correctness of a software program.
The analysis of the parsed expressions is conventionally performed using a set of rules reduces to the task of matching instances of various expressions in the E-graph against a set of prespecified patterns. This analysis is performed by a software program commonly referred to as a prover. Each of the prespecified patterns represent one or more rules. The instances of each of the patterns found in the E-graph is statically checked for formal correctness by the prover. As explored in greater detail elsewhere in this patent, the pattern-matching process that lies at the core of the prover becomes more complex in the presence of equalities.
Conventional provers, such as those used in prior program verification techniques, are often memory- and time-intensive because they use exhaustive search techniques. Furthermore, conventional provers do not generate adequate feedback when they fail to prove a conjecture, assertion or assumption. The lack of user-friendly reporting has reduced the usability and attractiveness of conventional provers.
It would therefore be desirable to have a prover that is both speedy as well as efficient in its use of memory resources. It would additionally be desirable to have a prover that could efficiently analyze a set of logical relationships to determine the truth or falsity of one or more assertions.
It has also been found desirable to have a prover analyze all or part of a computer program or other symbolic logical input using a set of formal rules that are themselves deduced from other parts of the program or input. The power and flexibility of a prover could be enhanced even further if it would accept user-supplied conjectures as well as internally generated working assumptions both statically as well as dynamically. It would additionally be useful to have a prover analyze computer source code or symbolic logical input using stored axioms from an axiom file or database.
It would be helpful if a prover were sensitive to the context of the analysis and distinguish between globally valid axioms, locally valid axioms and conditionally valid axioms. It would provide further utility if a prover used domain-specific algorithms for analyzing important functions and predicates such as arithmetic operators and the equality condition. It would be an added benefit if the pattern-matching techniques used in a prover were adaptable to other applications involving the matching of a graph against one or more patterns.
One of the principal problems addressed by the system and method of the present invention is to perform pattern matching (on an E-graph) in the presence of equalities. The system and method of the subject invention permits the prover input to be recursively-modified based upon prior prover output.
The present invention further aims at enhancing the power and flexibility of a prover by accepting user-supplied conjectures as well as internally-generated working assumptions both statically as well as dynamically. The present invention also provides further utility by using domain-specific algorithms for analyzing important functions and predicates such as arithmetic operators and the equality condition.
In one aspect, the present invention includes a technique for increasing the speed of operation of a theorem prover relating to program verification using adaptive pattern-matching. Source code in a specific programming language is converted to one or more formulae, each formula representing a specific reformulation of the source code that facilitates program verification. Each formula is converted into an E-graph which is a particular kind of a directed acyclic graph having leaf nodes and interior nodes connected by directed and/or undirected edges.
Certain of the formulae may be such that the E-graph associated with a formula may not be fully active for the purposes of the pattern-matching process. In the discussion that follows, the xe2x80x9cactivexe2x80x9d portion of an E-graph refers to the part(s) of the E-graph over which the pattern matching process of the present invention executes. The pattern-matching process of the present invention is not performed on the xe2x80x9cinactivexe2x80x9d nodes of the E-graph. It should be emphasized that the set of xe2x80x9cactivexe2x80x9d nodes of an E-graph can change dynamically during the operation of the present invention.
Some of the nodes of an E-graph (which are also referred to as E-nodes) may be related to other E-nodes through equivalence relationships. Equivalence relationships between groups of E-nodes are stored in a data structure which is referred to as an equivalence class. A collection of rules defining the semantics of the programming language is stored in an axiom database. Rules and conjectures about the source code may also be added to the axiom database during the analysis. Each rule and conjecture to be tested is first converted into a pattern.
The task of proving a rule or conjecture or verifying some or all of the source code is thus transformed into the task of matching the pattern associated with the rule or conjecture against the active nodes of the E-graph (i.e. the nodes corresponding to the appropriate portion of the source code). It should be noted that in the preferred embodiment of the present invention, the prover xe2x80x9cprovesxe2x80x9d each rule or conjecture by attempting to disprove the negated rule or conjecture. In one implementation, this is done by combining the negated rule or conjecture with associated active portion of the E-graph into a data construct called the xe2x80x9ccontext.xe2x80x9d The prover attempts to disprove the negated rule or conjecture by looking for internal inconsistencies or contradictions in the constructed context.
The pattern-matching process at the core of the prover begins with a comprehensive search over the entire E-graph for the plenary set of patterns. This comprehensive baseline search is referred to as the first round of matching. After the initial round of matching, the E-graph may be modified by the addition of new nodes or equivalence relationships. The search for internal inconsistencies is repeated anew after each modification to the E-graph. These subsequent searches through the E-graph are referred to as Round 2 of matching, Round 3 of matching, etc. After each round of matching, the E-graph may be modified by the addition of new nodes or equivalence relationships or the activation of dormant nodes. It should be noted that an E-graph is typically invariant during a round of matching.
The efficiency of the pattern-matching process can be improved by limiting the search (using knowledge about the changes, if any, made to the E-graph after the previous round of matching) to only those portions of the E-graph and those patterns which may be relevant to the subsequent rounds of matching. The first type of optimization, i.e. restricting the search to only those portions of the E-graph where new matches may plausibly be found is performed using the mod-times optimization technique. The second type of optimization, i.e. restricting the search to only those patterns that may plausibly result in new matches being found in the modified E-graph is performed using the pattern-element optimization technique.
The mod-times optimization technique increases the efficiency of the pattern-searching process by time-stamping the changes to various regions of the E-graph. The pattern-element optimization permits an additional increase in efficiency by indexing the connection relations between the nodes of the E-graph to speed up the correlation of the structure of an E-graph with a pattern being searched.
These two optimizations, which may be used in any application that may be mapped or reduced to the task of matching or searching for a pattern in a graph, lie at the core of the present invention. It should be emphasized that the mod-times optimization technique and the pattern-element optimization technique are independent of each other. Thus each of them can be used or applied separately or conjunctively as needed. It should be reiterated that the present prover can also be used for applications other than program verification, e.g., in symbolic algebra.
The present invention permits and facilitates the verification of information. This includes the automated verification of the accuracy of a computer program or the automated proving of a theorem regarding symbolic logic relative to a set of externally-generated rules about the syntax and semantics of the computer program or symbolic logic. In this regard, all or part of a computer program or symbolic logic is selectively transformed into one or more formulae. The generated formulae are designed to be amenable to automated testing against one or more of the externally-generated rules.
Formula handlers can be included to generate an initial E-graph from the formula. Each of the nodes of the E-graph (i.e. the E-nodes) represents either a constant term, a variable term or an algebraic or logical operator.
Each of the rules in the set of externally-generated rules may, in accordance with still further aspects of the invention, be converted into one or more distinct search patterns. It should be noted that the set of externally-generated rules may be dynamically added to and/or modified during the analysis. Further, some of the rules may be converted into multiple distinct search patterns, called multi-patterns. Each active E-node of the initial E-graph may be searched using a pattern-matching algorithm to detect instances of the pattern. Information about any new matches that are found are stored in a data structure.
After each round of searching, the E-graph can be selectively modified to add or activate additional nodes and/or additional equivalence relationships to some or all of the nodes of the modified E-graph. This is followed by identification of those regions of the modified E-graph that may be relevant to the search. The identified regions of the modified E-graph are searched for instances of the search pattern. These last three steps are repeated until no new matches are found in the active portion of the E-graph for the complete set of search patterns.
The identification of certain regions of the modified E-graph as being relevant to a search for a distinct search pattern can be performed in at least two ways. The first technique involves maintaining information about when each of the nodes in the active portion of said E-graph was last modified or affected. If the search pattern is a multi-pattern, then this technique uses the earliest modification time of all of the patterns of the multi-pattern as a gating term to determine the effective modification time for the multi-pattern.
The second technique for increasing the efficiency of the search process involves selective pattern matching based upon the nature of an event triggering a change in the E-graph. There are at least two such significant triggering events: the merger of two or more equivalence classes, and the activation of previously inactive E-nodes in an E-graph.
The efficiency of the search process can be significantly improved by indexing all pairs of function symbols in each search pattern that are parent-child pairs. As would be expected, each parent-child pair of function symbols comprises a parent function symbol and a child function symbol. A parent-child function symbol pair is characterized by an application of the child function symbol in the search pattern also being an argument of the parent function symbol in the same search pattern.
The relevance of a merger of two or more equivalence classes to the match process for a distinct search pattern is established when the merger causes some active application of the child function symbol becoming equivalent to some active argument of the parent function symbol. The indexed information about all the parent-child pairs in the active portion of an E-graph is stored in a data structure called a global parent-child (gpc) set.
A similar indexing can also be performed for all parent-parent pairs of function symbols in each search pattern. A parent-parent function symbol pair is characterized by the two parent function symbols being independently applied to two distinct occurrences of a common pattern variable in the search pattern.
The relevance of a merger of two or more equivalence classes to the match process for a distinct search patterns is established when the merger causes some active argument of one of the parent function symbol becoming equivalent to some active argument of the other parent function symbol. The indexed information about all parent-parent pairs in the active portion of an e-graph is stored is a data structure called the global parent-parent (gpp) set.
As mentioned earlier, the second type of significant triggering event is the activation of previously inactive E-nodes in an E-graph. In this case too, the efficiency of the search process can be significantly improved by updating the parent-child and parent-parent function symbol pair indices for each of the search patterns. This updating is performed to incorporate the effect of activating certain inactive E-nodes. In addition, a global set of trivial parent elements in the active portion of the E-graph is maintained and updated to reflect the activation of the erstwhile inactive E-nodes.
The global parent-child function pair set, the global parent-parent function pair set and the trivial parent function element set can be formed using approximate sets. A 64-bit hash function can advantageously be used to implement these approximate sets.