This invention relates generally to intermediate languages, and more particularly to indefinite-size variables within such intermediate languages.
Intermediate language-type models for programming languages have become increasingly popular. In an intermediate language model, a source code is generally compiled into a desirably substantially platform-independent intermediate language. When the code is desired to be run on a particular platform, an execution engine on that platform then interprets or compiles the intermediate language to native code understandable by the platform. Examples of systems that use intermediate languages include the Java virtual machine.
An advantage of intermediate-language code is thus its platform portability. Desirably, once a source code has been compiled into an intermediate-language code, this latter code can then be distributed to different platforms, such as those having different underlying processors, such as x86-type, PowerPC, and DEC Alpha processors, and the code will be properly compiled or interpreted to native code and executed. In other words, the source code of a program desirably does not need to be recompiled for every type of platform on which the program is intended to be run.
However, current intermediate-language models do not provide for optimal scalability among different platforms. Scalability in this context generally refers to the ability of a program to run well on both low-end and high-end platforms. A low-end platform may have a 32-bit architecture, which means that the processor can process data up to 32-bits in width (that is, a xe2x80x9cwordxe2x80x9d of data 32-bits long) at one time. Conversely, a high-end platform may have a 64-bit architecture, which means that the processor can process data up to 64-bits in width at one time.
For example, within the prior art, a source code may be compiled into an intermediate-language code that automatically specifies 32-bit variables, such as variables having an unsigned (viz., pointer) data type. For a low-end platform having a 32-bit architecture, this is not an issue, since the full (32-bit) processing capability of the architecture is utilized. However, for high-end platforms having a 64-bit architecture, specification of only 32-bit variables means that the full (64-bit) processing capability of the architecture is underutilized.
The full capability of a 64-bit architecture may thus not be usable by an intermediate-language code in which the size of variables (32-bit, 64-bit, etc.) is automatically fixed when compiling source code into the intermediate-language code, or when the source code itself is written. To use the full capability, the source code may either have to be recompiled into intermediate-language code, or, worse, the source code may have to be rewritten to specify 64-bit variables instead of 32-bit variables. However, specifying 64-bit variables a priori, regardless of the desired target platform, is also not a workable solution: such resulting intermediate-language code would not run on 32-bit platforms, for example, since they are not able to handle 64-bit variables.
It is noted that this is just one example of data types and data type size. Other data types include integer, floating point (i.e., real number), etc. Data types may also be of other sizes besides 32-bit and 64-bit, including 8-bit and 16-bit, especially when dealing with older processors, as well as 128-bit, 256-bit, etc., when dealing with more state-of-the-art processors. In general, the term data type as used herein refers to any type of data type, and data type size refers to any size of n bits, where n is not limited.
The issue of data types and data type sizes is implicitly handled in a disadvantageous manner by various intermediate languages. Two other types of intermediate language include o-code, generated by compilers from source code written in BCPL (the xe2x80x9cB Computer Programming Language,xe2x80x9d a predecessor to the commonly used C programming language), and p-code, generated by compilers from source code written in Pascal. O-code has no support for data types. That is, o-code does not distinguish between integers, floating points, unsigned integers (i.e. pointers), etc. Furthermore, all variables, regardless of their data type, are of the same size. P-code, allows different data types, but does not provide for differently sized data types.
The inability to size the data types of variables in accordance with the platform on which an intermediate-language code is to be run thus significantly impairs the scalability of intermediate-language code. As a result, the usefulness of intermediate-language code is decreased, since intermediate-language code is generally used in the first place for more portability as compared to native code that is compiled directly from source code. For these and other reasons, there is a need for the present invention.
The invention relates to indefinite-size variables within an intermediate language. In one embodiment, a computer-implemented method first inputs intermediate language code having a size-indefinite variable. The method generates native code based on the intermediate-language codexe2x80x94including generating a size-definite variable corresponding to the size-indefinite variable, according to a machine-specific criteria. The method then outputs the native code; for example, in one embodiment, the method executes the native code.
As an example, in one embodiment, a program already in intermediate language code may have a variable that is a pointer (viz., having an unsigned data type), and that is size-indefinite. A pointer is a type of variable that references a memory cell within a range of memory cells. The range of memory cells that can be referenced by the pointer is limited by the size of the pointer. A 32-bit pointer can reference 232 cells, while a 64-bit pointer can reference 264 cells. 32-bit, 64-bit, etc., pointers are specifically referred to as size-definite pointers, in that they have a definite size. Thus, a size-indefinite pointer is a pointer that does not have a definite size. That is, the variable is not specified in the intermediate language code itself as being 32-bits in size, 64-bits in size, etc.
When the intermediate language code is transformed to native code, the underlying platform on which the native code is to be run, for example, then dictates the size of this pointer. In the case of a 32-bit architecture, the size-indefinite pointer can be transformed to a 32-bit pointer, while in the case of a 64-bit architecture, the size-indefinite pointer can be transformed to a 64-bit pointer. This means that a single intermediate language code is able to run well on both 32-bit and 64-bit architectures.
Embodiments of the invention therefore provide for advantages over the prior art. Intermediate language code of a program according to an embodiment of the invention can be scaled to both low-end and high-end platforms, without having to have the source code of the program recompiled into intermediate language code or rewritten. That is, depending on the machine-specific criteria of the platform on which the program is to be run, size-indefinite variables are transformed into size-definite variables appropriate to the platform on which the program is to be run.
The invention includes computer-implemented methods, machine-readable media, computerized systems, devices and computers of varying scopes. Other aspects, embodiments and advantages of the invention, beyond those described here, will become apparent by reading the detailed description and with reference to the drawings.