1. Field of the Invention
The present invention relates to compilers and, more particularly, to improved methods for compiling source programs using one or more libraries.
2. Description of the Related Art
A computer program is typically written in a high-level programming language, such as Fortran, C, C++, Java, Ada, etc. Computer programs written in high-level programming languages can be referred to as source programs. Typically, source programs are composed of one or more source files. A compiler translates at least a portion of one or more source files into one or more object files. An individual translation of a source file which is translated by the compiler can be referred to as a compilation unit. A linker can combine one or more object files into an executable program. A computer can then interpret (run) the executable program. The combination of the compiler, linker, and computer can be referred to as a “programming system”.
Object files are typically packaged together into libraries, which also can be provided to the linker. Because the libraries are often provided by different institutions, the compilation environment of the libraries is often different from that of the other objects within the program.
Modern programming languages often provide a mechanism for the programmer to write portions of the source which can be replicated and specialized for a number of different uses. For example, such mechanism is called “generics” in Ada and “templates” in C++. To illustrate, a programmer may define, for example, a generic “stack” data structure template with several available operations, such as push, pop, list, etc. The programmer or the compiler can later specialize the stack to be a stack of integers, a stack of stacks of floating-point numbers, etc.
When a programming language permits specialization of generic templates, the compiler must generate the specialization. In C++ programming language, this process is called instantiation, and the specialized templates are called instances. Because only one instance is typically required to produce the executable program, a programming system should produce only one “effective” instance for any one program. A programming system can produce only one effective instance, by either preventing duplicate instantiations or rendering any duplicates inoperative. Thus, producing only one “effective” instance.
One method for producing only one effective instance can be referred to as “link-time translation”. Link-time translation operates to compile without producing any instances. Using this method, the linker will initially fail to produce the executable program because some instances are missing and remain unresolved at initial link time. A programming system using link-time translation, however, can extract the names of missing instances after the initial link and compile the missing instances, and re-link to produce the executable program. One drawback with the link-time translation is that it leads to very long link times. Further, it separates the cause of an error (the use of an instance in a compilation unit) from the reporting of an error (provided by the compilation of the instance), which may make locating and correcting errors difficult.
Another method for producing only one effective instance can be referred to as “assigning instances to translation units”. Using this method, it is possible to track requests for instances and instances missing at link time. When a source unit is subsequently recompiled, the instances assigned to it can be generated to produce the executable program. However, similar to the link-time translation method, this method can lead to very long link times, as well as typically requiring recompilation of source units.
Still another method for producing one effective instance is to generate all instances needed for every compilation unit, and then rely on the linker to either bypass or remove duplicate instances. The bypassing of instances can be achieved with several techniques, including “archive search order”, “weak symbols”, “common data (comdat)”, and “dynamic link interposition”. One drawback with this method is that it typically leads to unnecessarily large object files because instances generally appear in the objects more than once. It can also lead to long compilation times since object files are larger than necessary, as well as long link times because the linker has to process unused instances.
Finally, another method is to generate the instances into a repository that is shared between compilation units. The compiler generates an instance within a secondary object file and places the secondary object into the repository, but only if the instance does not already exist within the repository. This method can also result in unnecessarily large object files, as well as requiring larger compilation times than the other methods because of the need for a repository and the increased number of object files. Furthermore, using a shared repository requires having a shared compilation environment. However, libraries are not typically compiled in the same compilation environment. Thus, duplicate instances may often still exist between libraries and non-library objects, although, it is possible to bypass or remove duplicate instances within libraries. Rather than bypass or remove duplicates from libraries, it would be more efficient to not generate duplicates of instances within the libraries.
To prevent unnecessary duplication, compilers may provide a mechanism for the programmer to specify a list of instances that are to be suppressed (not to be generated). For example, in the Sun C++ compilers, this mechanism takes the form of “directives” within an “options file”. This mechanism may be used to avoid duplicating instances within libraries by extracting the linker symbol names (linker names) from the libraries, filtering those names to remove non-instance names, to yield a list of instance names. The list of instance names can be converted to a suppression list which can be provided to the compiler during compilation of a compilation unit. The compiler can then convert the names on the suppression list to an internal representation so that the compiler can compare the internal representation of candidate instances to the names in the suppression list before generating instances.
Unfortunately, however, generating this list requires significant amount of time, resources and preparatory work. In addition, constructing the suppression list requires converting linker symbol names to the program's symbol names (e.g., C++ symbol names). This conversion can be difficult, fragile or even untenable, because languages may require more information in the linker symbol name than is present in the language name, which may be difficult to remove, or because they may require less information in the linker symbol name than is present in the language name, which may be impossible to recover. Furthermore, the comparing of the internal representations within the compiler can itself be a difficult and fragile task.
In view of the foregoing, there is a need for more efficient methods of compiling source programs making use of instances and libraries.