1. Technical Field
The present invention relates generally to compilers for computer programming languages and, more particularly, to a compiler having a separate component for implementing one or more runtime data representations, thereby allowing a user to either modify an existing data representation or implement a new data representation without having to access and/or modify source code for the compiler.
2. Description of Related Art
In general, a compiler is a computer program which is utilized for converting high level language (HLL) text files (source code) to object files (object code). In particular, a compiler will parse the source code determining the linguistic constructs it contains (e.g., statement, procedure), check to determine the source code is legal, and then generate object code (comprising one or more object files). The object code is then processed by a linker, wherein the object files are linked together and assigned to particular blocks of memory to become an executable image in machine language. A compiler may be configured to perform both compiling and linking, in which case the output of the compiler is an executable file.
As is known to those skilled in the art, a compiler for a given programming language has to represent runtime data with data structures that are suitable to support the features of the programming language. Conventionally, compilers for different programming languages (and sometimes the same programming language) are configured to represent runtime data in a different manner. There are a variety of techniques which have been developed for structuring runtime data, and such techniques are well-known by those skilled in the art.
For instance, a compiler for an object-oriented (xe2x80x9cOxe2x80x94Oxe2x80x9d) programming language is configured to represent xe2x80x9cobjectsxe2x80x9d at runtime with data structures that are appropriate to support the features of the language. The manner in which a compiler represents objects of an Oxe2x80x94O programming language at runtime is referred to herein as an xe2x80x9cobject-model.xe2x80x9d As is well-known by those skilled in the art, an object may be represented at runtime using a data block comprising: a pointer to a virtual function table for the class to which the object belongs, object data, and additional virtual function table pointers and/or virtual base pointers in the case of multiple and/or virtual inheritance. A virtual function table comprises a sequence of pointers which point to code for executing methods for a given object of a particular class.
Typically, compilers are written to generate one data representation which controls how data is structured at runtime. However, a compiler may be configured to generate different kinds of data representations, even for the same program (e.g., support more than one object-model). For instance, with Oxe2x80x94O programming languages, it is desirable for a compiler to support more than one object-model since different object-models have different characteristics; some perform better than others, some offer better release-to-release compatibility, some support different language features, and some are desired for compatibility with other vendors. Similarly, for non-Oxe2x80x94O languages that support matrices (or sets, or any other high-level data type), it is desirable for compilers to support more than one data representation since different representations have different characteristics; some perform better in certain applications, there are different space/time tradeoffs (e.g, dense versus sparse representations), and some are compatible with other languages or other vendors.
Conventionally, the code within a compiler that is responsible for implementing the data representation (structuring the runtime data) is deeply intertwined with much of the other source code comprising the compiler (i.e., assumptions about the data representation are embedded in the code for most of the compiler). Consequently, since there is no identifiable separation in a conventional compiler between the code that depends on the data representation and the code that does not depend on the data representation, the process of modifying and/or adding support for a new data representation can require extensive changes to the compiler code, which is a formidable task that is prone to error.
The present invention is a compiler comprising one or more separate components, each of which contains the source code of the compiler which is responsible for implementing a corresponding data representation. These components are responsible for all of the parts of compilation which depend on the corresponding data representation.
In one aspect of the present invention, a compiler comprises:
a converter for converting program code to object code; and
a data representation implementor for isolating within the compiler information that relates to representation of data at runtime, wherein the converter accesses the data representation implementor to obtain information that is needed for converting any portion of the program code that is dependent on representation of data at runtime.
In another aspect of the invention, the data representation implementor comprises a separate object-model implementor (OMI) for each different object-model (i.e., manner of representing objects at runtime) supported by a compiler for an object-oriented programming language. Each OMI is a separate component containing all compiler code that is dependent on the object-model it supports (e.g., code for providing information about and transforming all program constructs whose implementation is dependent on the object-model). The implementations of different object-models are thus separated from one another and from the rest of the compiler.
In yet another aspect of the present invention, different object-models may be supported within one compilation by assigning each class declaration of the object-oriented programming language to a particular OMI, which is responsible for objects of that class.
In another aspect of the present invention, other components of the compiler may consult a corresponding OMI via an interface (set of methods) when information regarding the object representation is needed.
In yet another aspect of the present invention, the compiler supports the ability to implement new object-models, by simply writing new object-model implementors. This can be done by a third-party, without having to access the source code of the compiler (other than to the header files which are part of a public application program interface (API)).
Advantageously, by isolating the code within the compiler that is dependent upon the runtime data representation, the data representation may readily be modified without having to rewrite the entire compiler. The compiler can then include means for registering a new data representation implementor, and for specifying which data representation is to be used in a particular context (e.g., which object-model is to be used for which class declarations). In this manner, a single compilation can include multiple data representations.
These and other aspects, features and advantages of the present invention will be described and become apparent from the following detailed description of preferred embodiments, which is to be read in connection with the accompanying drawings.