1. Field of the Invention
The present invention relates generally to computer instruction emulation, and more particularly, to a system for performing an instruction mapping. Still more particularly, the present invention is a system for mapping a plurality of bits within a source instruction to a cache address corresponding to an emulation routine address, and a method for manufacturing the system.
2. Description of Related Art
Microprocessors execute machine instructions that result in specific changes of state within the microprocessor hardware. A collection of such instructions, when properly arranged, is known as a program. Execution of a program's instructions in sequence performs a series of state changes that results in useful work, such as adding a column of figures.
Many computer architectures exist, each of which understands a specific and typically unique set of machine instructions or "language." Therefore, a program written for one architecture is useless and incomprehensible to another architecture. Since programs can require a tremendous investment of time and resources to write (hundreds of man years of skilled labor in some cases), and are limited to a single architecture, it is desirable to have a means to translate the program from one language to another. Such a translator would allow the investment made in writing programs for one architecture to be retained for writing the same program on other architectures.
Three broad solutions to the problem of architecture-specific program execution exist. These are static recompilation, dynamic recompilation, and interpretive emulation. In each case, an emulation program is written, usually on the alternative or host architecture, that translates a sequence of source program instructions intended for the source or emulated architecture into one or more instructions in the host's instruction language that perform the same function. The emulation program can be written to simulate the actions of each source program instruction individually, or to simulate the actions of several source program instructions in a single step. In general, simulation of each source program instruction individually provides greater emulation accuracy at the expense of execution speed.
In static recompilation, the emulated program is swept through in its entirety prior to execution and translated to a host program. The host program is then executed. This is rarely a complete solution since most programs exhibit dynamic behavior that cannot be predicted statically. For example, a branch instruction may depend upon a result computed by previous instructions that cannot be predicted prior to running the program. Therefore, the branch instruction cannot be translated to its meaningful counterparts in the host's language. Static recompilation also suffers from the shortcoming of requiring significant amounts of memory to store the translated copy of the program. In addition, static recompilation requires a complete understanding of the behavior of all possible programs. Thus, static recompilation is not a complete solution to effectively translating computer programs for emulation.
Dynamic recompilation allows emulation of programs that exhibit dynamic behavior such as branch instructions. In dynamic recompilation, programs are statically translated until a problem instruction (usually a branch) that cannot be accurately translated is reached. The translated program is then executed up to this point such that the architectural state of the emulated machine is updated. The problem instruction can then be emulated by the execution of an emulation routine corresponding to the problem instruction, after which static translation can begin again. This method can successfully emulate any program and is efficient if large sections of source instructions can be statically translated. However, the translator must run concurrently with the emulated program, and adds significant overhead to the emulation process. The speed and memory requirements are difficult to predict, and will vary greatly depending upon the character of the emulated program.
Interpretive emulation emulates each source instruction as a separate entity. Interpretive emulation provides an architecturally distinct state at each emulated source instruction boundary, and has the potential of being the most accurate and interactive of the three emulation techniques. Interpretive emulation typically has a predictable and potentially small memory requirement, since no translated copy of the program need be stored. However, interpretive emulation can be the slowest method, requiring many more host instructions to emulate a given source instruction as compared to either static or dynamic recompilation.
Interpretive emulation is the most desirable emulation technique in terms of emulation accuracy and robust performance; unfortunately, it is typically the slowest emulation technique. The most straightforward method of implementing an interpretive emulator is to employ a dispatch loop within the emulator to fetch a source instruction from the source program stream, and to use the binary value of the operation code within the source instruction to index a table in memory. The value of the table entry, referred to here as a "pointer," is the address of an emulation routine consisting of host instructions that implement the architectural changes of state required to emulate the original source instruction. The dispatch loop issues a jump to the address indicated by the pointer, after which the emulation routine is executed. The final host instruction within the emulation routine returns control to the dispatch loop, which fetches the next source instruction from the source program.
The prior art implementation suffers from a major performance problem. The performance problem arises from the fact that the host instruction references are in a pattern that is very different than that found in normal, non-emulated programs, and for which most microprocessors are poorly suited. Careful study of typical program behavior has shown that most programs exhibit great address coherence. In particular, the instruction most likely to execute next is the next instruction in sequence. Loops also execute frequently, so an instruction sequence that has just executed is likely to execute again. As a result of such study, modern microprocessors are designed to utilize cache memories. Caches are designed to exploit address coherence behavior by fetching instruction sequences prior to their use (e.g. burst operations) and by providing efficient access to instructions within close address proximity. If instructions are referenced that are not in the cache, particularly in reduced instruction set computing (RISC) architectures, the time required to fetch and execute the instruction can increase by 20 times or more. Moreover, in a pipelined operation, instructions that are outside the cache can cause the pipeline to stall. While the source instructions within the source instruction program are in a pattern corresponding to typical program behavior and are therefore likely to exhibit address coherence, the emulation routines corresponding to the source instructions have no address coherence. This in turn means that the host instruction references from one emulation routine to another have no address coherence. Since the host instruction references in an interpretive emulation environment are not in a pattern resembling typical program behavior, the performance of the prior art interpretive emulator does not benefit from the use of a cache.
A prior art attempt at improving the performance of an interpretive emulator is found in an emulator produced by Insignia Solutions, Inc. The Insignia emulator maintains statistics indicating a number of times a given loop within a source instruction program has been executed. If the loop has been executed more than a predetermined number of times, the Insignia emulator assumes that the loop will be executed again. In this case, the emulator stops the emulation, and translates the source instructions of the loop into a series of host instructions that can be executed directly, after which emulation continues. If the emulator subsequently detects that the source instruction program is entering the given loop, the series of host instructions emulating the loop will be executed. While this approach toward enhancing the performance of an interpretive emulator functions well for frequently executed loops, the approach does not result in any improvement for loops that are infrequently executed. Moreover, the Insignia Solutions, Inc. emulator also does not address performance enhancement for the emulation of individual source instructions, and requires additional overhead for identifying loops and tracking their use.
What is needed is a means for implementing interpretive emulation where a standard cache organization can be used efficiently without the performance problems of the prior art.