1. Field of the Invention
The present invention relates to the analysis of computer programs and, in particular, to the detection of programming errors in a computer program through analysis of the use of resources prescribed by the computer program.
2. Discussion of Related Art
Some existing programming error detection methods detect violations in the computer instruction protocol with which a particular program comports. Such a programming error detection method is called "static checking" since the syntax of the computer instructions, or "statements", of the computer program is analyzed outside the context of the behavior resulting from the execution of those statements. The term "statement" is used herein as it is defined in Section 6.6 of American National Standard for Programming Languages - - - C (American National Standards Institute/International Organization for Standardization ANSI/ISO 9899-1990), which is reproduced in Herbert Schildt, The Annotated ANSI C Standard, (Osborne McGraw-Hill 1990) (hereinafter the C Standard). Briefly, in the context of the C computer language, a statement is a computer instruction other than a declaration. In other words, a statement is a any expression or instruction which directs a computer to carry out one or more processing steps. Static checking in the context of the C computer language includes, for example, (i) making sure that no two variables in the computer program are identified by the same name; (ii) ensuring that each "break" statement corresponds to a preceding "while", "for", or "switch" statement; and (iii) verifying that operators are applied to compatible operands. Static checking is discussed, for example, in Alfred V. Aho et al., Compilers, (Addison Wesley 1988).
Some existing static checking methods, which are generally called "data flow analysis" techniques, analyze data flow through a program to detect programming errors. Such analysis includes use of control flow information, such as sequencing of statements and loop statements, to detect the improper use of data objects, e.g., the use of a variable before a value has been assigned to the variable. Flow of control in a computer program is the particular sequence in which computer instructions of the computer program are executed in a computer process defined by the computer program. Computer programs and processes and the relation therebetween are discussed more completely below. Data flow techniques are discussed in Beizer, Software Testing Techniques, (1990) at pp. 145-172.
Existing static checking techniques suffer from the inability to track use of resources through several discrete components of a computer program such as several functions which collectively form a computer program. For example, a variable may be initialized in a first function and used in a calculation in a second, subsequently executed function. By analysis of only the computer instructions of the second function, the variable appears to be used before the variable is initialized which can be erroneously reported as an error. In addition, existing static checking techniques are static in nature and do not consider particular data values associated with particular data objects. Static analysis is limited to what can be determined without considering the dynamic effects of program execution. Beizer describes several areas for which static analysis is inadequate, including: arrays, especially dynamically calculated indices and dynamically allocated arrays; records and pointers; files; and alternate state tables, representing the different semantics of different types in the same program.
Static checkers do not detect errors involving calculated addresses corresponding to dynamically allocated memory or calculated indices into arrays. Calculated addresses and indices are addresses and indices, respectively, which are calculated during the execution of a computer process. Static checkers do not detect such errors in a computer program because checking for such errors typically involves determining the precise values of calculated addresses and indices, which in turn involves consideration of the behavior of the computer program during execution, i.e., as a computer process.
Static checkers do not detect errors involving the use of questionably allocated resources or the use of resources whose state is determined by the value of a variable or other data object. In the C computer language, a resource, e.g., dynamically allocate memory or a file, is questionably allocated. In other words, a function which allocates the resource completes successfully, even if allocation of the resource failed. Whether the allocation succeeded is determined by comparison of the returned item of the function, which is a pointer to the allocated resource, to an invalid value, e.g., NULL. Static checkers do not consider the behavior of a called function but instead only verify that the syntax of the call to the called function comports with the syntax prescribed in the particular computer language. Therefore, static checkers do not detect errors involving use of a resource which is questionably allocated.
As described above, a static checker does not consider the behavior of a called function. Thus, verifying the use of a resource which spans multiple functions is impossible. For example, if a first function allocates a resource, a second function uses the resource, and a third function deallocates the resource, static checking of any of the first, second, and third functions alone or a function calling all three functions, cannot verify the proper use of the resource.
When using an error detection technique, which employs insufficient information regarding the behavior of a computer program during execution, the errors reported by such a technique are either under-inclusive or over-inclusive. For example, if a function accepts as a parameter a pointer to an allocated resource, e.g., a file, and uses the parameter without comparing the parameter to an invalid pointer, the function contains a possible error. Whether the function contains an error depends on circumstances which are unknown within the context of the function. For example, if the pointer is verified to be a valid pointer before the function is called, there is no error in the function. To report the use of the pointer as an error would clutter an analysis of the function with a falsely reported error, and thus would be over-inclusive. Falsely reporting errors in analysis of a large program, at best, is an inconvenience to a program developer and, at worst, renders analysis of a computer program useless. If the pointer is not checked to be valid prior to calling the function, failure to report the error results in failure to detect an error which can cause an execution of the computer program to be aborted abruptly and can result in the corruption of data structures and possibly in the loss of valuable data.
One particular drawback of the failure of static checking techniques to consider the dynamic behavior of a computer program is the reporting of apparent, but "false", errors, i.e., errors resulting from computer instructions through which control cannot flow. In functions in which control flow paths depend on particular values associated with particular data structures and program variables, control flow cannot be determined without considering the values associated with those data structures and variables which generally in turn cannot be determined without consideration of the behavior of the function during execution. As a result, instructions which are not executed or which are executed only under specific circumstances are generally assumed to always be executed by static checkers.
Another type of existing programming error detection technique is called program verification. In program verification, a computer program is treated as a formal mathematical object. Errors in the computer program are detecting by proving, or failing to prove, certain properties of the computer program using theoretical mathematics. One property for which a proof is generally attempted is that, given certain inputs, a computer process defined by the computer program produces certain outputs. If the proof fails, the computer program contains a programming error. Such program verification techniques are described, for example, in Eric C. R. Hehner et al., A Practical Theory of Programming, (Verlag 1993) and Ole-Johan Dahl, Verifiable Programming, (Prentice Hall 1992).
Verified programming techniques are limited in at least two ways: (i) only properties of computer programs which can be expressed and automatically proven using formal logic can be verified, and (ii) a person developing a computer program generally must formally specify the properties of the computer program. Formally specifying the properties of a computer program is extremely difficult in any case and intractable for larger programs. As a result, commercially successful products employing verified programming techniques are quite rare.
In another type of programming error detection technique, a computer program is executed, thus forming a computer process, and the behavior of the computer process is monitored. Since a computer program is analyzed during execution, such a programming error detection technique is called "runtime checking". Some runtime checking techniques include automatically inserting computer instructions into a computer program such that execution of the inserted computer instructions note, during execution of the computer program, the status of variables and resources of the computer program. Such an error detection technique is described by U.S. Pat. No. 5,193,180 to Hastings.
Runtime checking can typically detect errors such as array indices out of bounds and memory leaks. Examples of runtime checking include Purify which is available from Pure Software Inc. of Sunnyvale, California and Insight which is available from Parasoft Corporation of Pasadena, Calif. Purify inserts into a computer program monitoring computer instructions after a computer program has been compiled in to an object code form, and Insight inserts into a computer program monitoring computer instructions before a computer program is compiled, i.e., while the computer program is still in a source code form.
Runtime checking is generally limited to what can be determined by actually executing the computer instructions of a computer program with actual, specific inputs. Runtime checking does not consider all possible control flow paths through a computer program but considers only those control flow paths corresponding to the particular inputs to the computer program supplied during execution. It is generally impracticable to coerce a computer process, formed by execution of the computer instructions of a computer program, to follow all possible control flow paths. To do so requires that a programmer anticipate all possible contingencies which might occur during execution of the computer instructions of a computer program and to cause or emulate all possible combinations of occurrences of such contingencies.
Furthermore, runtime checking can only be used when the computer program is complete. Analysis of a single function before the function is incorporated into a complete program is impossible in runtime checking since the function must be executed to be analyzed. Analysis of a function using runtime checking therefore requires that (i) all functions of a computer program be developed and combined to form the computer program prior to analysis of any of the functions or (ii) that a special purpose test program, which incorporates the function, be developed to test the function. Top-down programming, which involves the design, implementation, and testing of individual functions prior to inclusion in a complete computer program and which is a widely known and preferred method of developing more complex computer programs, therefore does not lend itself well to runtime analysis.
What is needed is a programming error detection technique which considers the dynamic behavior of a computer program, which automatically considers substantially all possible control flow paths through the computer program, and which does not require a programmer of such a computer program to express the computer program in an alternative, e.g., mathematical, form. What is further needed is a programming error detection technique which analyzes an individual component of a program, considering the behavior of the component during execution. What is further needed is a programming error detection technique which considers the behavior of a component whose execution is invoked by a computer program component under analysis.