Object-oriented programs are programs written in programming languages such as C++ that support a particular form of user-defined data types called "classes". A class "declaration" in the programming language specifies the data contained by variables of the class type, or operations supported by those variables, or both. Variables or instances of these class types are called objects.
A feature of C++ programming, and similar object oriented languages, is that its classes are stored in separate files, often grouped together into libraries. The source code that defines a class is commonly separated into a declaration part, contained in a "header file", and an implementation part, contained in a "body file". The header files contain important preprocessor directives; the preprocessor directive #include permits the header file to be referred to by the body file, after which declarations in the header file may be used in the body file. A header file will be referenced by its corresponding body file, but may also be referenced by one or more other body files that make use of the defined class.
A compiler converts the source language contained in a body file, plus all referenced header files, into an object module file containing machine code. An executable program is formed by a linker or linking loader which combines the object modules generated from several body files, and in doing so, resolves references from one object module to symbols representing subroutines and data definitions found in another.
While some errors in programs are so fundamental that they halt compilation, more often logical errors prevent the program from producing the results expected. Therefore, modern compiler suites generally include a software tool called a debugger that can be used to diagnose the program and trace these errors. To support the debugger, the compiler, under the control of an option, produces information describing the symbols and types in the program as well as information to map between source lines and the binary code output. This extra information enables the programmer to examine the types, variables and data structures by name and to follow the execution of the program through the source code. Current compilers, such as IBM's VisualAgeJ C++ for OS/27 Version 3.0, generate debug information in the following naive fashion. For every type referenced in the source for the compilation unit, a full description of that type is provided in the debug type information that is included in the resulting object module. If the same type is referenced in multiple body files, then a copy of that type description will be generated in each of the object modules. This duplication results from the fact that the compiler processes the body files one at a time, and therefore does not know whether a needed type description will be generated in some other object module. Because of the size of the debug information, this duplication can result in massive executable module sizes where the size of the debug information dwarfs all other aspects of the module. In addition, significant compile resources (time, working set, etc.) are devoted to the creation of this debug information so that widespread duplication represents a large degradation in the compile time needed to build the executable mode.
The prior art contains two approaches to ameliorating the module size and compile time problems.
One approach is to enhance the linker (or create a post link utility) to determine when multiple local type descriptions from different object modules are describing the same type and create a single global version of the type description (Global refers to the fact that it is accessible beyond the scope of a single object module's debug information). The link utility eliminates the duplicate local type descriptions and remaps all references to the global version of the type description. This approach solves the executable module size problem, but the object module size problem remains. Also, the compile time problem may actually be exacerbated by the link time cost of packing the debug type information.
The second approach is to enhance the compiler to emit full type descriptions only in the "distinguished compile unit" for that type. A heuristic commonly used to select a distinguished compile unit for a class is described in The Annotated C++ Reference Manual by Ellis & Stroustrup, 1990. The compile unit that contains the implementation of the lexically first non-inline virtual function member in that class is used as the distinguished compile unit. In other compile units that must reference the type that is fully described in the distinguished compile unit, a degenerate description of the type is emitted. The degenerate reference is a debug type record that does not describe the type but does provide a unique identifier for the type. Typically, the unique identifier is the fully qualified type name if the type is a global type. The degenerate reference also identifies itself as degenerate by some means so that it can be distinguished from a description of a global type of the same name that has no members. By relying on the single definition rule in C++ the debugger and/or the linker is able to replace references to the incomplete type with references to the full type description.
While this technique solves the compiler time and disk space problems, it is unable to handle a very common class of applications, those that use binary class libraries or classes implemented in code (dll files) that are dynamically linked without debugging information. Since the prior art method depends on emitting the full type information only in the distinguished compile unit for the type, it cannot produce a debuggable application when the source code of the distinguished compile unit is not part of the user's build process.
Class libraries currently shipped in binary do not usually include debugging information with them because their producers assume that the header files shipped with the libraries provide enough information to build debugging information using a standard compiler and debugger. Furthermore, full debug information for the library source that could be used in compiling debugging information for an application would not be limited to the information required to describe types. The source information would expose other information about the implementation of the class library that producers could be unwilling to make generally available to customers in the absence of source code licences.