Various applications, including but not limited to the analysis of software programs, benefit from the creation of directed graphs, and more specifically, directed acyclic graphs to represent flow concepts as appropriate to the application. A directed graph may consist of nodes and edges. An edge may connect one node to another, with a direction from one node to the other. Edges may be represented by arrows to indicate the direction. Two edges may be contiguous if one flows into a node and the other flows out of the same node. Directed graphs may have edges that “loop backwards”; that is, it is possible to follow a set of contiguous edges and return to the same node more than once. Such graphs are called cyclic. A directed acyclic graph, or DAG, may have no such backward edges. FIG. 1 illustrates an exemplary directed graph. Nodes are indicated by ovals, as exemplified by Node 100. Edges are represented by lines with arrows, as exemplified by Edge 102. Edges 102 and 103 are contiguous. Edge 104 is a backward edge that makes this a cyclic graph, since by traversing Edges 103, 105, 106, and 104, one can reach Node 107 more than once.
The entry point of the graph may refer to any node that has no incoming edge (except a backwards edge in the case of a cyclic graph); there may be more than one such node, but more typically there may be only one. The exit point of the graph may refer to any node that has no outgoing edge (except a backwards edge in a cyclic graph); there may be more than one such node, but more typically there may be only one. A path may consist of a sequence of contiguous edges flowing from the entry point of the graph to the exit point; a path segment may flow between any two nodes along a path. An edge may belong to more than one segment, and a segment may belong to more than one path. FIG. 2 illustrates an exemplary DAG. Node 200 is the entry point of the DAG; Node 201 is the exit point. Path 203 represents one possible path through the DAG; Segment 204 illustrates a segment. Edge 205 is shared between Segments 204 and 206, and Segment 204 is shared between Paths 203 and 207. One may speak of the relative position of one node with respect to the other such that if an edge or segment connects two nodes, the node from which the edge or segment flows may be said to be above the node into which the edge or segment flows. The act of moving along contiguous edges is referred to herein as traversal.
In an application wherein the use of a DAG represents control flow of the program, nodes may represent decisions, each of which may have more than one outgoing edge. Such a node will hereinafter be referred to as a fork point. Where a node represents a statement rather than a decision, it may typically represent a point in the program where two different flows merge. Such a node will be referred to hereinafter as a merge point. For the sake of clarity, blocks of code containing no decisions, herein referred to as linear blocks of code or simply code blocks, may also be represented on the graph. They do not, by definition, contain any control flow statements, but the contents of the linear code blocks may be useful for analysis. In order to further clarify the elements of a control flow graph, true control flow nodes will herein be represented by ovals, whereas linear code blocks will be represented by boxes.
FIG. 3 illustrates a control flow graph embodiment of the DAG of FIG. 2, with Node 300 representing a linear code block, and Node 301 representing a decision with two possible outcomes; this may represent a simple if/then/else construct in a program. Node 302 represents a decision with three possible outcomes; this may represent a case or switch construct in a program. Blocks 303 and 304, being contiguous, could, for the purposes of certain kinds of analysis, be combined into a single block without affecting the results of the analysis. Nodes 301 and 302 are fork points; Node 305 is a merge point.
In the context of certain kinds of analysis, cyclic graphs may be transformed into acyclic graphs. The specific nature of the application will determine whether this is possible, and how such a transformation might be made, and will be known to one of ordinary skill in the art within the application area. In an application using a DAG to represent a program control-flow graph, program loops, which are cyclic, may be unrolled to create a linear representation of their execution, assigning “unknown” or “havoc” values to variables as appropriate. Similarly, other conventional techniques including but not limited to function inlining or summarizing may be used to transform a cyclic program control flow graph into a DAG.
Certain types of analysis, including but not limited to identification of program defects using static analysis, may make use of DAGs to represent all possible execution flows of a program. A program performing such analysis will be referred to herein as a “checker.” A checker may identify a variety of different program characteristics, defects, or artifacts of interest including but not limited to such examples as uninitialized variables, null pointer dereferences, and possible race conditions. Such a checker may traverse some or all paths in a DAG as it performs its search. This traversal of the DAG may be intended to simulate all possible execution flows of the program represented by the DAG.
A checker may attempt to traverse every possible unique path in a DAG. The method of accomplishing a complete traversal may vary. Methods include, but are not limited to, depth-first and breadth-first search, and using recursion to provide coverage or using worklists to record paths that must be traversed as branches are encountered.
According to the semantics of a graph in a given application, there may be paths containing mutual inconsistencies such that their traversal by a checker is not useful. In the example of a DAG representing program control flow, such paths would never be executed in the program represented by the DAG. Such mutually inconsistent paths are referred to hereinafter as false paths. Because different paths may share edges and segments, there may be edges or segments that belong both to valid paths and false paths.
Because of the amount of computing time and resources required to detect false paths, it may typically be easier to include false paths in analysis. However, including false paths may result in spurious analysis results. For example, in an application where a program control-flow graph is being analyzed for defects, any defects found as a result of analysis of false paths will not represent defects that could ever be encountered when the program executes. This would result in the reporting of invalid defects by the analyzer; such invalid defects will be hereinafter referred to as false positives. In this and other applications, it may be appreciated that it is desirable to reduce the number of false positive results produced by the analyzers. A process that systematically identifies and removes false paths from a DAG may be referred to as false path pruning.
Conventional methods of detecting and avoiding false paths may use state information that causes the elimination of some false positives, but at the cost of causing some valid defects to be missed. It may therefore be appreciated that there remains a need for a more precise method of discrimination between valid and false paths that avoids both excessive false positive reports and missing defects, and that is efficient both from an execution time and resource consumption standpoint.