1. Field of the Invention
The present invention relates to a portable virtual machine architecture, and more particularly to object code structures for dynamic translation of an architecture-independent program implementation.
2. Description of the Relevant Art
There have been a number of efforts to design systems which are portable across machine architectures, i.e., systems which can operate on a variety of different hardware platforms. However, such efforts have been hampered by incompatibilities in operating system interfaces and fundamental hardware capabilities. Approaches to the portability problem have included interpreters, translators, compilers with a common intermediate format across platforms, and virtual machine architectures. Although the various approaches tend to blur together, some distinctions can be made. Interpreters are programs which accept programs written in source code and which perform the sequence of computations, i.e., of machine level instructions, implied by the source code. The UCSD Pascal system, which was the primary implementation of Pascal for the Apple II, was one of the most successful early attempts at a portable system. The system interpreted byte code which made it architecture-independent (see Apple Computer, Inc. Apple Pascal Operating System Reference Manual, 1980, pp. 229-245). Smalltalk-80 also defined a byte-code interpreter (see Adele Goldberg and David Robson, Smalltalk80: The Language and its Implementation, Addison-Wesley, 1983). One of Smalltalk-80's main advances was the clean integration of the interpreter into the system, where the system has a flexible interface to examine and control interpretation, permitting one to write portable programming tools. Some variants of the Smalltalk-80 system dynamically compiled the byte-code with some sacrifice in the above functionality.
A translator, as contrasted with an interpreter, accepts as an input a program written in a source format and produces as output an object code representation of the program. Usually, the object code is machine language for a particular processor architecture. Translators can be divided into assembler translators, which translate low level languages such as assembly language, and compilers, which translate high-level languages such as C, C++, Pascal, Ada, etc.
Compiler writers have been striving to achieve a common intermediate format for some time. A common intermediate format would enable them to produce n front ends and m back ends (n+m total components) rather than n*m compilers to compile n languages for m target machine architectures. The GNU C compiler went quite far in this direction, but the intermediate format is ad hoc and is riddled with special cases for architectural features. The Marion system (see David G. Bradlee, Robert R. Henry, and Susan J. Eggers, The Marion System for Retargetable Instruction Scheduling, Proceedings of the 1991 ACM SIGPLAN Conference on Programming Language Design and Implementation, 1991, pp. 229-240) was another attempt, but it too had to be modified for each new target architecture.
Hardware designers have attempted to institute a virtual architecture as the assembly language for their systems. The Transputer, which is a modular, scalable multiprocessor architecture is an example of such a system. The virtual architecture allows changes in the underlying physical structure of the machine architecture without program recompilation. Limited virtual architecture mechanisms can be seen in processors such as the Motorola 68040, where some instructions are emulated by traps. A related software approach is to defme an intermediate form from which programs are translated into the machine language of the target processor. Mahler was an attempt at this and the authors claim encouraging results (see David W. Wall and Michael L. Powell, The Mahler Experience: Using an Intermediate Language as the Machine Description, Proceedings of the 2nd International Conference on Architectural Support for Programming Languages and Operating Systems, 1987, pp. 100-104).
The Open Software Foundation's (OSF's) ANDF (the Architecture-Neutral Distribution Format) and its precursor TDF are other steps in this direction. See Stavros Macrakis, The Structure of ANDF: Principles and Examples, Open Software Foundation, 1993 and United Kingdom Defense Research Agency, TDF Specification, Issue 2.1 June 1993. ANDF defmes the form of data passed from an ANDF producer (which is language-dependent and machine-independent) to an ANDF installer (which is language- and machine-independent). An ANDF producer is like a compiler front-end (syntax and semantics analyzer), and an ANDF installer is end (code generator and optimizer). ANDF itself is thus a form of compiler intermediate language. Unfortunately, since ANDF (and TDF) leaves most of the compiling work to the installer, it is unsuitable for dynamic translation.
Individual machine architectures often represent data according to differing sets of representation conventions. Representations which are in accordance with the conventions of a particular machine architecture are said to be native to that machine architecture. Two common sources of variation in native representations are alignment and byte-ordering conventions.
Certain processor architectures require that data be aligned in accordance with a set of machine-specific alignment rules. For example, most RISC architectures require that data be aligned on a natural boundary in physical memory (i.e., at an address that is a multiple of the size of the data type). For a two-byte quantity (e.g., a 16-bit, or short, integer) such a natural boundary alignment requirement requires that the first byte of the two-quantity appear at an even byte address. Similarly, the first byte of a four-byte quantity (e.g., a 32-bit, or long, integer) and of an eight-byte quantity (e.g., a double precision floating point number) must appear at an address which is a multiple of 4 and 8, respectively. Other architectures, notably the 80.times.86 architecture, have no such alignment restrictions.
Another source of architecture-specific variation is byte-ordering. In some processor architectures, bytes are ordered according to a scheme where the least significant byte is stored in the lowest byte address. This scheme is known as little-endian byte ordering. In many other architectures, bytes are ordered according to a big-endian scheme where the least significant byte is stored in the highest byte address. FIG. 1A illustrates the representation of the number 1,000,000 (i.e., 0F4240 in hexadecimal) as a 32-bit integer in accordance with the big-endian scheme. FIG. 1B illustrates the corresponding little-endian representation. Most microprocessor architectures including the Motorola 680.times.0 and 88.times.00 series, the PowerPC, the MIPS R.times.000 series microprocessors adhere to the big-endian scheme. However, several architectures, notably the Intel 80.times.86 series and the DEC VAX architectures, are little-endian.