This invention relates generally to systems for performing error checking, coverage analysis, or other kinds of run-time analysis of computer programs written in high level programming languages such as C and C++. More specifically, it relates to systems that perform analysis and instrumentation of the source code of programs, and use that analysis and instrumentation to report useful information, including programming errors, program code coverage analysis, and other information as the program is linked and as it runs.
Programming tools that help automatically detect programming errors and that provide code coverage analysis have proven to be an important part of a programmer's arsenal. Consider the case of automatic detection of programming errors in programs written using high-level languages such as C and C++. The errors that could be detected automatically can be broadly categorized as follows:
Compile-Time Errors
These errors are well-described in the literature, and have been detected by many different kinds of programming tools. Most traditionally detected by a high level language compiler, examples of compile-time errors include:
CT1. Incorrect syntax PA0 CT2. Use of undefined identifiers PA0 CT3. type mismatches: for instance, calling a function using an actual parameter whose type is sufficiently different from the type of the corresponding formal parameter. PA0 LT1. Multiple definitions of a global function or variable, when only one definition is allowed, or a missing definition of a global function or variable. PA0 LT2. Mismatches between the size of a variable declared in one module and defined in another module. PA0 LT3. Type mismatches between the declaration of a function in one module and its definition (or a second declaration of it) in another module. PA0 LT4. Type mismatches between the declaration cf a variable in one module and its definition (or a second declaration of it) in another module. PA0 LT5. Mismatches in the definition of a type declared in one module, versus the definition of a type with the same name in another module, when the definitions should be the same. PA0 RT1. Indirection through a pointer that does not point to a valid object, For example, indirection through a pointer to an object whose associated memory has been freed, or indirection through an uninitialized pointer. PA0 RT2. Indexing beyond the legal bounds of an array object, PA0 RT3. Indexing beyond the bounds of an array object that is contained within an enclosing object. PA0 RT4. Incorrect interpretation of the type of object designated by a pointer. In the case of C or C++, this can happen because the program performed an unintended or illegal type cast operation, or the program accessed a member of a union after a value was stored in a different member of the union, or the program released a dynamically allocated chunk of memory, then reallocated that same memory as a different type.
Compile-time errors are mentioned here only for the purpose of differentiating them from link-time and run-time errors.
Link-Time Errors
Because many high level programming languages support the notion of separate compilation of program modules, there is a possibility of introducing programming errors that cannot be detected until those separately compiled modules are combined into a single executable program, or into a shared library module, or into a relocatable object module.
Examples of link time errors that could be detected are:
In the case of a program written in the C++ programming language, the above link-time errors are described in a set of rules that are formally described in the draft ANSI C++ standard, and called the "One Definition Rule" (Document X3J16/96-0225 or WG21/N1043, "Working Paper for Draft Proposed International Standard for Information Systems-Programming Language C++," Dec. 2, 1996; Section 3.2, "One Definition Rule," is the section relevant to link-time checking; this document is referred to herein as the "C++ Working Paper"). Note that there are similar, but different rules for programs written in the C language, as specified in its ANSI standard (American National Standard X3.159-1989, "American National Standard for Information Systems-Programming Language C"; the section relevant to link-time checking is section 3.1.2.6, "Compatible Type and Composite Type.").
Run-Time Errors
Some errors cannot in general be detected until a program actually executes.
Examples of such errors are:
Instrumentation
Program instrumentation is a technique that has been used in the implementation of run-time error checking as well as code coverage analysis, as well as other program analysis. Code coverage analysis provides a programmer with information to determine how well a program has been tested, by measuring how many different logical paths through the program have been executed, accumulating the information over several independent executions of the program. To instrument a program is to add additional expressions and/or statements to an existing program for the purpose of analyzing the behavior of that program, while still maintaining the original meaning of the program so that it continues to execute as originally intended by the programmer.
Prior Link-Time Error Detection Techniques
Program linkers (sometimes called "link editors") such as UNIX's ld have traditionally provided link-time error detection for errors such as LT1. A linker compares all of the declarations and definitions of a given function or variable in order to do its job; it is trivial for the linker to detect error LT1 at the same time.
To detect LT2, the object file symbol table records the size of a variable even when the variable is only referenced (some object file formats record only the sizes of defined variables, not the sizes of referenced variables). Once these sizes are available to a linker, it can issue an error message if all the sizes for a given variable are not equal. Sun Microsystems's linker id on SunOS 5.4 on the SPARC architecture detects LT2.
Lint reports many inter-module inconsistencies in C programs, including many instances of LT3 and LT4. Lint and its checks for inter-module consistency are mentioned in S. C. Johnson and D. M. Ritchie, "Portability of C Programs and the UNIX System," Bell System Technical Journal 57(6) part 2, 1978, pp. 2021-2048.
Most C++ compilation systems check that the type and number of function parameters are consistent across translation units. B. Stroustrup: Type-Safe Linkage for C++. Proc. USENIX C++ Conference, Denver, pp 193-210. October, 1988. Also, USENIX Computing Systems, V1 no 4, Fall 1988, pp 371-404.
ObjectCenter detects LT1 through LT5 (G. Wyant, J. Sherman, D. Reed, and S. Chapman, "An Object-Oriented Program Development Environment for C++," Conf. Proc. Of C++ At Work 1991). But its implementation is different from the invention described here. Hash codes are not computed for types; instead, complete type information is kept for the entire program being checked. Because type information is so complete, the type graph (the type data structures together with the pointers between them) is large and highly connected (in particular, unlike in the present invention, ObjectCenter type graphs are cyclic). See also S. Kendall and G. Allin, "Sharing Between Translation Units in C++ Program Databases," Proc. 1994 USENIX C++ Conference.
The program linker included with IBM's C Set ++ C++ programming environment implements LT3 and LT4 (IBM, C Set++ for AIX User's Guide, Version 3 Release 1, 1995).
The DEC Systems Research Center implementation of the Modula-3 programming language, called SRC Modula-3, implements LT1 through LT5 for that language (the source code of the DEC SRC Modula-3 system is currently available at http://www.research.digital.com/SRC/modula-3/html/srcm3.html; the language is defined in G. Nelson, ed., Systems Programming with Modula-3, Englewood Cliffs, N.J.: Prentice Hall, 1991). SRC Modula-3 uses hash codes as stand-ins for data types.
Prior Run-Time Error Detection Techniques
Some high level languages such as ADA and Pascal have traditionally included support for detection of run-time errors in their programming systems (the compiler, linker, and debugger). In fact, in ADA, detection of many run-time errors is mandated by the language definition. Languages such as C and C++ have not had this traditional support, and their ISO standards (draft standard in the case of C++) do not mandate support for such checks. Nonetheless, several programming tools have added support for detection of run-time errors: for example, BCC (Kendall, Runtime Checking for C Programs, USENIX, Software Tools, Summer 83 Toronto Conf. Proceedings, pp. 6-16), ObjectCenter, TestCenter, Purify (Hastings, U.S. Pat. No. 5,535,329), and Insure++ (Parasoft Corporation, Insure++ Automatic Runtime Debugger User's Guide, 1995).
Run-time error checking in ObjectCenter was based on an interpreter, which is an execution engine for programs that executes a sequence of byte codes. In this technique, the logic to perform the run-time checking was embedded in the logic of the execution engine itself, and was not represented in the byte codes, which were simply a representation of original source program.
In Purify and TestCenter, run-time error checking is accomplished by instrumenting object code.
Prior Instrumentation Techniques
A traditional programming language processor (for example, a compiler) accepts a textual representation of a program as input and produces some machine-readable representation of the program as output. The output might be an instruction sequence suitable for execution on a microprocessor, or a byte code stream executable by an interpreter, or even source code in the same or a different high-level programming language. An instrumentor adds additional code to the generated code to measure some aspect of the program's behavior. For example, an instrumentor might add code to measure the number of times a function is called, whether or not a particular section of code was executed, or whether a pointer that is about to be dereferenced points to valid storage.
Often, an intermediate representation of the program, called a "syntax tree," is built by a language processor front-end as the input is parsed. Then, the syntax tree is transformed into the final output representation by the language processor back-end. By modifying the syntax tree after the input has been parsed, an instrumentor can add the instrumentation code required; the back-end will then generate code corresponding to the instrumented program.
Parasoft describes a technique that inserts special error-checking nodes as parents of normal syntax-tree nodes (Kolawa et al., U.S. Pat. No. 5,581,696).