1. Field of the Invention
The present invention relates generally to computer instruction emulation, and more particularly to an apparatus and method for accelerating instruction emulation. Still more particularly, the present invention is an apparatus and method for prefetching a pointer to a next emulation routine during execution of a current emulation routine.
2. Description of Related Art
Microprocessors execute machine instructions that result in specific changes of state within the microprocessor hardware. A collection of such instructions, when properly arranged, is known as a program. Execution of a program's instructions in sequence performs a series of state changes that results in useful work, such as adding a column of figures.
Many computer architectures exist, each of which understands a specific and typically unique set of machine instructions or "language." Therefore, a program written for one architecture is useless and incomprehensible to another architecture. Since programs can require a tremendous investment of time and resources to write (hundreds of man years of skilled labor in some cases), and are limited to a single architecture, it is desirable to have a means to translate the program from one language to another. Such a translator would allow the investment made in writing programs for one architecture to be retained for writing the same program on other architectures.
Three broad solutions to the problem of architecture-specific program execution exist. These solutions are static recompilation, dynamic recompilation, and interpretive emulation. In each case, an emulation program is written, usually on the alternative or host architecture, that translates a sequence of source program instructions intended for the source or emulated architecture into one or more instructions in the host's instruction language that perform the same function. The emulation program can be written to simulate the actions of each source program instruction individually, or to simulate the actions of several source program instructions in a single step. In general, simulation of each source program instruction individually provides greater emulation accuracy at the expense of execution speed.
In static recompilation, the emulated program is swept through in its entirety prior to execution and translated into a host program. The host program is then executed. This is rarely a complete solution since most programs exhibit dynamic behavior that cannot be predicted statically. For example, a branch instruction may depend upon a result computed by previous instructions that cannot be predicted prior to running the program. Therefore, the branch instruction cannot be translated to its meaningful counterparts in the host's language. Static recompilation also suffers from the shortcoming of requiring significant amounts of memory to store the translated copy of the program. In addition, static recompilation requires a complete understanding of the behavior of all possible programs. Thus, static recompilation is not a complete solution to effectively translating computer programs for emulation.
Dynamic recompilation allows emulation of programs that exhibit dynamic behavior such as branch instructions. In dynamic recompilation, programs are statically translated until a problem instruction (usually a branch) that cannot be accurately translated is reached. The translated program is then executed up to this point such that the architectural state of the emulated machine is updated. The problem instruction can then be emulated by the execution of an emulation routine corresponding to the problem instruction, after which static translation can begin again. This method can successfully emulate any program and is efficient if large sections of source instructions can be statically translated. However, the translator must run concurrently with the emulated program, and adds significant overhead to the emulation process. The speed and memory requirements are also difficult to predict, and will vary greatly depending upon the character of the emulated program.
Interpretive emulation emulates each source instruction as a separate entity. Interpretive emulation provides an architecturally distinct state at each emulated source instruction boundary, and is the most accurate and interactive of the three emulation techniques. Interpretive emulation typically has a predictable and potentially small memory requirement, since no translated copy of the program need be stored. However, interpretive emulation can be the slowest method, requiring many more host instructions to emulate a given source instruction as compared to either static or dynamic recompilation.
Interpretive emulation is the most desirable emulation technique in terms of emulation accuracy and robust performance; unfortunately, it is typically the slowest emulation technique. The most straightforward method of implementing an interpretive emulator is to employ a dispatch loop within the emulator to fetch a source instruction from the source program stream, and to use the binary value of the operation code (opcode) within the source instruction to index a table in memory. The value of the table entry, referred to here as a "pointer," is the address of an emulation routine consisting of host instructions that implement the architectural changes of state required to emulate the original source instruction. The dispatch loop issues a jump to the address indicated by the pointer, after which the emulation routine is executed. The final host instruction within the emulation routine returns control to the dispatch loop, which fetches the next source instruction from the source program.
The prior art emulation systems and methods suffer from a major performance problem in that a given set of memory-based operations is identically performed for each source instruction that is emulated. For example, for every emulated source instruction, the source instruction and a pointer to an appropriate emulation routine must be retrieved from memory. That is, the source instruction fetch and pointer fetch operations rely upon values being returned from memory, and each of these operations is required prior to the emulation of a given source instruction. Operations that require memory interaction typically require much more time to execute than other operations. If the targeted data does not reside within a cache, the fetch operations indicated above can take longer to execute than an entire emulation routine. This greatly increases the time required to emulate the source instruction. Reducing the time required to obtain a result from even one of these operations can significantly reduce the time required to emulate an entire program of source instructions. What is needed is an apparatus and method for accelerating interpretive emulation by minimizing the delay between the completion of a current emulation routine and the availability of the pointer to the next emulation routine.