1. Field of the Invention
The present invention relates to a computer system and method, and more particularly to a method (and system) for increasing the efficiency of and accelerating the performance of emulating the behavior of one computer system on another.
2. Description of the Related Art
A major motivation for emulation, is to allow systems written for a particular architecture, to execute on another architecture, with a minimum loss of performance. Clearly then, the efficiency of the emulation process and the quality of the resulting “host” code sequence are of paramount importance.
Typically, a computing system includes several portions, including the processors, the memory, and the input/output devices. It is often necessary to emulate the behavior of one computing system on another. One of the principal reasons for emulation is to enable programs written for a system (e.g., the “target computing system”), to perform with the same results on another system (e.g., the “host computing system”).
The need for emulating the behavior of one computer system on another has long been recognized. Several schemes have been proposed for doing this. A summary of these techniques appears in U.S. Pat. No. 6,031,992 to Cmelik et al. U.S. Pat. No. 6,031,992 discloses a combined hardware/software scheme to perform the emulation of the instruction set of one processor on another. This scheme allows the hardware design to incorporate features that facilitate the execution of the target instruction set. For the same reason, however, this cannot emulate all systems equally efficiently.
SimOS and SimICS are examples of systems that can emulate without special hardware features. However, their performance is not as effective as that of the method and structure of U.S. Pat. No. 6,031,992.
In general, these systems employ various levels of translation. For example, “Alpha Runs x86 Code with FX!32”, Jim Turley, Mar. 5, 1996, Microprocessor Report, described techniques where the extent of translation is varied according to the extent of execution of the code.
In conventional emulation methods and techniques, various levels of translation may be employed to enhance the performance of the host instructions produced by the emulator. However, notwithstanding all the current techniques, there remains much room for improvement.
As described above, one method of emulation is disclosed that includes a combination of interpretation and translation. Each target instruction is interpreted, a simple heuristic is employed to record frequency of execution of instruction groups, and when a threshold condition is satisfied, that group is scheduled for translation by placing it in a translation pool. This technique allows the interpretation process to proceed in parallel with the translation process, and so the translator may deploy fairly aggressive optimization techniques.
This approach amortizes the cost of the optimization, and is effective for long running, frequently executed instruction sequences.
However, it may not prove so effective for execution sequences of shorter duration. Experience has shown that the emulation of complete systems results in significant portions of code which execute only once. For these cases anything other than efficient interpretation is unnecessary, and certainly the overhead of aggressive compilation is unwarranted.
There is however, a large amount of code in such system emulations which lies somewhere between the two.
Prior to the present invention, no such method has addressed specifically just such bodies of code. Indeed, there has been no technique for producing high quality translated host instructions, with little or no increase in the cost of interpretation.
In sum, to emulate a target instruction, a certain number of instructions must be executed and typically many of such instructions are highly dependent on previous instructions in the sequence. Even though modern processors employ “instruction level parallelism”, in which multiple independent instructions can be executed in parallel, (execute at the same time) during emulation on most modern architectures, the potential exists for a high degree of processor under utilization. Frequently, in normal applications, there are some number of independent instructions (operations), the execution of which, with the judicious application of scheduling techniques, can be overlapped with other executing instructions. However, emulation by interpretation is an inherently serial technique, because the emulation must be performed for each of the target machine instructions sequentially, and so there is nothing in the emulation which can be overlapped. The host machine on which the emulator is executing, is potentially severely underutilized. Prior to the present invention, there has been no method (or structure) for exploiting that underutilization to increase the performance of the emulation process in a unique way.