In recent years, the C++ programming language has gained broad acceptance among programmers on many computer platforms. Many programmers favor C++ because of its object-oriented facilities for distributing and allocating functional behavior among different classes of objects, and for defining clean, concise interfaces for communication among different objects. At the same time, the C++ language lets programmers retain control over many performance-related decisions such as data layout and code inlining. At first blush, C++ may therefore seem to offer an ideal combination of properties: C++ offers high performance comparable to the C programming language, but simultaneously allows programmers to construct layers of abstraction to hide implementation details. Indeed, the C++ language works well for many purposes. For more detailed information regarding the C++ programming language, including the language's object-oriented features, the reader is directed to Ellis & Stroustrup, The Annotated C++ Reference Manual, Addison-Wesley (1990), which is incorporated herein in its entirety by this reference.
However, problems can arise when C++ object interfaces are modified over time. An unpleasant phenomenon familiar to every C++ programmer is that if a header file containing class definitions is revised, massive recompilations of application program (or "client") code dependent upon those class definitions becomes necessary. Stated differently, the compiled code of a client is good only for a specific implementation of each class it uses. If a class implementation is changed, the compiled code "breaks" (i.e., becomes incompatible). For example, reversing the declaration order of the real and imaginary components of a C++ class for complex numbers will not affect the language-level semantics of the class, but it will break nearly every client of the class. We call this sensitivity of code to class implementations "brittleness". When compiled codes are brittle, they frequently need to be recompiled, which is a tedious and unwelcome task. (Note: "compiled code" is often known as "object code," but that term is avoided herein because of the special, different meaning of the term "object" in the context of object-oriented programming languages such as C++.)
This massive recompilation problem is symptomatic of a failure to hide information. In the interests of maximizing efficiency, compiled C++ object code typically fails to conceal the details of object implementation information, and thereby forfeits benefits of object-oriented programming such as information hiding, encapsulation, and code reuse. Generally, each access to a class object compiles in an environment which contains full implementation information about that class. The availability of all this implementation information allows very tight code for data structure access, as with C language "structs," but it has a significant disadvantage. That very same tight code must be rearranged if the class changes. The information used at compile time is information which cannot be hidden behind the interface. And less information hiding leads to greater interdependency between clients and services. Similarly, other C++ features, such as modular scoping and overloaded functions, also use implementation information at compile time to generate tightly coupled compiled code, while C++ "templates" have no semantics except as defined by their text replacement at compile-time.
Thus, in practice, the conventional tight coupling of C++ compiled code results in the "brittleness" of compiled code with respect to class definitions. A significant percentage of a project's code may often be contained in the header files which define object classes, and these header files cannot be modified without (potentially) breaking all project binaries, a state of affairs which is only rectified by massive recompilation.
In many current C++ programming environments, a system utility (often known as the "make" utility) is invoked by users to generate compiled code for application programs. The "make" utility typically compares the timestamps of each source code file with the timestamp of any existing, corresponding compiled code file to determine whether the source code (or any header file referenced by the source code) has been modified since the existing compiled code was last generated. "Make" will use existing compiled code files if the source code has not been recently modified, because recompilation is plainly unnecessary in that case. However, relying solely on timestamp-based reasoning means that any change to a header file, no matter how trivial or irrelevant, triggers recompilation of all source files which directly or indirectly use that header file.
In the prior art, some attempts have been made to reduce the need for excessive recompilation by tracking dependencies more accurately. For example, if a comment is changed in a header file, but no actual class definition is changed, it is reasonable to continue to use the contents of client object files without recompilation, and to simply update the timestamps of such object files to indicate that the object file contents are currently valid. However, accurately maintaining dependency information can easily become as costly as recompilation itself. Moreover, dependency tracking fundamentally fails to remove dependencies. For example, if a new virtual function is added to a C++ class, such an addition will change the layout of the virtual function table of that class and all derived classes, and will necessitate recompilation of all client code invoking those classes. The only way to avoid such massive recompilation is to remove dependencies, and not simply track them.
A mechanism that has previously been used in computer science to decouple separate modules in languages such as Algol, Fortran, and Lisp is the procedure call. Typically, procedure calling conventions hide all aspects of the procedure body from the calling client. Only arguments and return values are exchanged, typically through a common program stack. Stated differently, modern procedure calling conventions hide all non-interface information. Consequently, interfacing two programs by means of a procedure call mechanism advantageously allows separate compilation, and eliminates the need for recompilation of the calling program every time purely internal aspects of the procedure are revised. However, conventional wisdom and the prior art have generally shunned using procedure call mechanisms to implement object-oriented operations in C++ code, in part to avoid incurring the run-time performance penalties traditionally associated with executing frequent procedure calls.
The Delta C++ compiler product, offered by Silicon Graphics, Inc. of Mountain View, Calif., is thought to implement one possible strategy for reducing recompilation requirements by removing internal class dependencies. It is thought by the authors that the Delta C++ product encodes some internal, class implementation information in a global data structure or symbol table separate from client code, in the form of stored constant values. This implementation information is then subsequently "inlined" into the client code by the linker at run-time. While the Delta C++ product is thought to make some headway in reducing the need for extensive recompilation, it leaves considerable room for further improvement, particularly with respect to the robustness of the scheme and the limited scope of implementation dependencies removed from client code.
Therefore, an improved methodology for C++ compilation is needed, one which provides full object-oriented benefits such as information hiding even at the compiled code level, but which nevertheless produces compiled code exhibiting acceptable levels of performance and efficiency.