1. Field of the Invention
The present invention relates generally to computer instruction emulation, and more particularly to an apparatus and method for accelerating instruction emulation. Still more particularly, the present invention is an apparatus and method for transfer of control from a currently executing emulation routine to a next emulation routine.
2. Description of the Background Art
Microprocessors execute machine instructions that result in specific changes of state within the microprocessor hardware. A collection of such instructions, when properly arranged, is known as a program. Execution of a program's instructions in sequence performs a series of state changes that results in useful work, such as adding a column of figures.
Many computer architectures exist, each of which understands a specific and typically unique set of machine instructions or "language." Therefore, a program written for one architecture is useless and incomprehensible to another architecture. Since programs can require a tremendous investment of time and resources to write (hundreds of man years of skilled labor in some cases), and are limited to a single architecture, it is desirable to have a means to translate the program from one language to another. Such a translator would allow the investment made in writing programs for one architecture to be retained for writing the same program on other architectures.
Three broad solutions to the problem of architecture-specific program execution exist. These are static recompilation, dynamic recompilation, and interpretive emulation. In each case, an emulation program is written, usually on the alternative or host architecture, that translates a sequence of source program instructions intended for the source or emulated architecture into one or more instructions in the host's instruction language that perform the same function. The emulation program can be written to simulate the actions of each source program instruction individually, or to simulate the actions of several source program instructions in a single step. In general, simulation of each source program instruction individually provides greater emulation accuracy at the expense of execution speed.
In static recompilation, the emulated program is swept through in its entirety prior to execution and translated to a host program. The host program is then executed. This is rarely a complete solution since most programs exhibit dynamic behavior that cannot be predicted statically. For example, a branch instruction may depend upon a result computed by previous instructions that cannot be predicted prior to running the program. Therefore, the branch instruction cannot be translated to its meaningful counterparts in the host's language. Static recompilation also suffers from the shortcoming of requiring significant amounts of memory to store the translated copy of the program. In addition, static recompilation requires a complete understanding of the behavior of all possible programs. Thus, static recompilation is not a complete solution to effectively translating computer programs for emulation.
Dynamic recompilation allows emulation of programs that exhibit dynamic behavior such as branch instructions. In dynamic recompilation, programs are statically translated until a problem instruction (usually a branch) that cannot be accurately translated is reached. The translated program is then executed up to this point such that the architectural state of the emulated machine is updated. The problem instruction can then be emulated by the execution of an emulation routine corresponding to the problem instruction, after which static translation can begin again. This method can successfully emulate any program and is efficient if large sections of source instructions can be statically translated. However, the translator must run concurrently with the emulated program, and adds significant overhead to the emulation process. The speed and memory requirements are difficult to predict, and will vary greatly depending upon the character of the emulated program.
Interpretive emulation emulates each source instruction as a separate entity. Interpretive emulation provides an architecturally distinct state at each emulated source instruction boundary, and has the potential of being the most accurate and interactive of the three emulation techniques. Interpretive emulation typically has a predictable and potentially small memory requirement, since no translated copy of the program need be stored. However, interpretive emulation can be the slowest method, requiring many more host instructions to emulate a given source instruction as compared to either static or dynamic recompilation.
Interpretive emulation is the most desirable emulation technique in terms of emulation accuracy and robust performance; unfortunately, it is typically the slowest emulation technique. The most straightforward method of implementing an interpretive emulator is to employ a dispatch loop within the emulator to fetch a source instruction from the source program stream, and to use the binary value of the operation code within the source instruction to index a table in memory. The value of the table entry, referred to as a "pointer," is the address of an emulation routine consisting of host instructions that implement the architectural changes of state required to emulate the original source instruction. The dispatch loop issues a jump to the address indicated by the pointer, after which the emulation routine is executed. The final host instruction within the emulation routine returns control to the dispatch loop, which fetches the next source instruction from the source program.
The prior art interpretive emulator suffers from a first major performance problem in that no emulation actually occurs during the set of operations performed within the dispatch loop. The overall emulation of any given source instruction can be partitioned into two time intervals. The first time interval is that required to complete the operations performed within the dispatch loop, and the second time interval is that required to complete the host instructions comprising the emulation routine. Each operation performed within the dispatch loop increases the overall time required to emulate any given source instruction. The execution of two operations within the dispatch loop that rely upon results being returned from memory particularly increases the overall emulation time. If the data targeted by either of these operations does not reside within a cache, either of these operations can take longer to execute than an entire emulation routine.
The second major performance problem occurs as a result of the two branch instructions required for each source instruction's emulation. That is, one branch instruction is executed in the jump to the emulation routine, and another branch instruction is executed when the emulation routine returns to the dispatch loop. While branches are conceptually simple, they are difficult to efficiently implement on most microprocessors, particularly those having reduced instruction set computing (RISC) architectures. Each branch instruction significantly increases the overall time required for the emulation of any given source instruction.
What is needed is an apparatus and method for transferring control from a current emulation routine to a next emulation routine in which control is transferred from a current to a next emulation routine without the problems associated with the prior art.