A large amount of effort in the development of computer programs is spent ensuring the correctness of the completed program. The correctness of a computer program is the degree to which it is free from errors in its specification, design, and implementation. Three common methods for detecting errors in a computer program are compile-time checking, runtime checking, and simulated path execution.
Compile-time checking is the process of evaluating a computer program based on its form, structure, or content. Compile-time checking tests properties that can be established before the execution of a program. One form of compile-time checking, known as syntax checking, verifies compliance with structural or grammatical rules defined for a language. For example, in the context of a computer program written in the well-known C++ programming language, using the statement B+C=A would produce an error because the correct format is A=B+C. Syntax checking is discussed, for example, in Richard Conway and David Gries, An Introduction to Programming (Winthrop Publishers, Inc., 1979). Another form of compile-time checking, known as data flow analysis, analyzes the sequence in which data transfer, use, and transformation are performed in a computer program to detect programming errors. Data flow analysis includes the use of control information, which relates to the sequence in which statements are performed in the execution of a computer program. A control flow is also referred to as a control flow path or, more simply, a code path. Data flow analysis can detect such errors as the use of a variable before assignment, two consecutive assignments to a variable, or assigning a value to a variable that is never used.
Compile-time checking techniques are inherently limited in that they do not consider the consequences of actual execution of the computer program in question. Compile-time checking is thus limited to what can be determined without considering the dynamic effects of program execution. For example, the lint compile-time checker available in the SPARCworks™ 3.0.1 programming environment from Sun Microsystems of Mountain View, Calif., analyzes computer code without regard to the dynamic flow of control through the code. This shortcoming causes lint to falsely report values being used before they are initialized, when such is not the case.
Another type of false error reported by compile-time analysis methods is an “apparent” error in instructions through which control flow cannot go. The sequence in which statements are performed often depends on particular values associated with particular variables. Compile-time checking methods generally assume statements are always executed because they cannot determine whether a particular code path is executed or under what specific circumstances program control flows through the code path.
Runtime checking, the other primary type of programming error detection method, involves evaluating a computer program based on its behavior during execution. Runtime checking involves executing the computer program with a known set of inputs and verifying the program results against the expected outcome. The set of test inputs, execution conditions, and expected results is called a “test case.” Often, in order to help locate errors, a printout or trace showing the value of selected variables at different points in the program's execution is produced.
Although simple in principle, the usefulness of runtime checking is limited by the complexity of the computer program. A tremendous amount of effort is involved in designing, making, and running test cases. Even after such extensive effort, the error detection capability of runtime checking is limited to the code paths executed by the specific set of inputs chosen. In all but the most simple computer programs, it is generally impractical to execute all possible control flow paths. Furthermore, runtime checking requires that a computer program be complete and ready for execution. Since a function must be executed to be analyzed, testing a function apart from incorporating it into a complete program requires the additional effort of building a program shell that provides the function with the necessary environment for execution.
One method to overcome the deficiencies of typical programming error detection methods is described in U.S. Pat. No. 5,694,539 (issued Dec. 2, 1997); U.S. Pat. No. 5,857,071 (issued Jan. 5, 1999); U.S. Pat. No. 5,968,113 (issued Oct. 19, 1999); and U.S. Pat. No. 6,079,031 (issued Jun. 20, 2000), all entitled “Computer Process Resource Modeling Method and Apparatus,” issued on, respectively, and U.S. Pat. No. 5,790,778, entitled “Simulated Program Execution Error Detection Method and Apparatus,” issued on Aug. 4, 1998, and assigned to Intrinsa Corp. of Mountain View, Calif. The disclosures of the above-identified issued U.S. patents are hereby incorporated herein by reference. This programming error detection method analyzes components of computer programs by tracking the effect of program instructions on the state of program resources. Each resource has a prescribed behavior represented by a number of states and transitions between states.
The difficulties in detecting errors in computer programs is compounded in the case of systems that consist of several individual programs that interact with each other. In such systems, the operation of each individual program is potentially affected by events occurring in the other programs that interact with it. Existing program analysis techniques operate either on the entire code of a single program or a subset thereof; such techniques are known respectively as whole-program analysis and partial-program analysis techniques. Both types of techniques can automatically identify defects in some number of source files. Whole-program analysis techniques are more effective, but partial-program analysis techniques can give useful results on subsets of code.
However, most modern software is written as a collection of interacting programs. Since the programs interact, they cannot be analyzed independently—the behavior of one program influences the behavior of other programs in the system. Further, the order in which programs must be analyzed to give useful results is informed by the dependency relationships between the programs, i.e., the calling relationships between them. The results of the analysis of one program must be used as inputs for the analysis of other programs that interact with it.
Program analysis tools typically allow the developer to describe “external behavior” that can in practice be used to specify the behavior of another component. Certain program analysis tools allow the description of a specific component's external behavior to be generated automatically by analyzing the component in question. For example, the PREfix analysis tool uses models and provides a complex mechanism to allow users to specify which models are used and produced in an analysis. The PC-Lint analysis tool uses libraries known as “lint libraries” and provides a comparably complex mechanism. By invoking the tool on different parts of the code in a specific order and furnishing the appropriate specification of which external behavior to generate and use for the analysis of individual components, it is possible to use a whole-program analysis tool to perform cross-program analysis, i.e., analysis of multiple programs that interact with each other.
However, this approach is not practical on a large system that has a large number of components. For example, the WINDOWS 2000 operating system build process generates over 3000 different programs, in addition to other programs provided in binary form. There are myriad dependency relationships between these programs. It is not practical to specify these dependencies to a program analysis tool manually, nor to manually determine the correct order in which the program analysis tool should be invoked on the individual programs.
A variety of techniques can be used to determine cross-program dependencies. For example, these relationships may be known a priori based on the design of the system. As an alternative, they can be determined manually by inspecting the code. They may also be determined automatically by analyzing how the system is built. In some cases, they may also be determined by analyzing the executable programs comprising the system.
What is needed is a method of using the information about cross-program dependencies to automatically invoke the program analysis tool appropriately on the individual programs. “Appropriately” includes providing the correct external behavior specifications, and determining the correct order in which programs should be analyzed.
A number of approaches have been proposed for solving the cross-program analysis problem. These approaches typically attempt to approximate the correct solution, but introduce potentially serious errors into the analysis.
One approach is to ignore the fact that there are multiple programs and instead treat the entire system as a single “program.” This approach introduces some potentially serious inaccuracies into simulation. Certain assumptions that are safe to make in whole-program and partial-program analysis cannot be safely made in cross-program analysis. For example, in whole- and partial-program analysis in the C programming language, a safe to assume that there is only one function with a given name because if that were not the case, the program would not be executable. This restriction is enforced by the linker. However, there is no such restriction when dealing with multiple programs. Since all known whole- and partial-program analysis tools build this assumption in, treating the multiple programs as a single program results both in large numbers of inaccurate assumptions about the structure of the program, and in the incorrect appearance that a substantial portion of the code can be safely excluded from analysis.
Another approach is to treat the programs as independent. This approach, however, fails to consider any issues related to the interaction between components. Still another approach involves attempting to manually specify the interactions between components. This approach does not scale well beyond a handful of components. It is difficult both to manually determine the dependencies between components, and to manually specify these dependencies for the tool.
Due to the inadequacy of existing program analysis techniques to analyze highly complex systems of individual programs, a need continues to exist for a programming error detection method that considers the behavior of executed program instructions and that can perform cross-program analysis on such complex systems.