This application relates in general to an emulator which uses dynamic binary translation, and in specific to a scheduling technique which allows an emulator to speculate instructions optimistically to exploit instruction-level parallelism.
Binary translation is a rapidly evolving emulation technology which addresses the object code compatibility problem associated with the introduction of a new instruction set architecture (ISA). Recent implementations of this technology use dynamic translation techniques to convert a program compiled for a legacy ISA to object code for a native ISA at run-time. Thus, a binary translator permits programs or applications that were compiled for a pre-existing architecture to be run on a new architecture without having to recompile those applications. The binary translator translates the complied application into binary form which is used on the new system. Note that this is transparent to the system user.
Binary translation is the process of directly translating object code compiled for one instruction set architecture (the legacy ISA) to object code for another architecture (the native ISA). This allows software transition between two dissimilar ISAs. For example, programs written for Intel Microprocessors may be translated to run on Alpha processors. Moreover, binary translation may be performed at run-time, thus allowing a legacy ISA program to be launched unmodified on the native ISA system. This type of performance is known as dynamic translation.
However, a problem occurs with dynamic translation when the native machine is statically scheduled, i.e., operations are issued in program order and must be carefully grouped to take advantage of parallelism. In order to exploit the parallel execution resources, modern compilers use heuristics or profile-guided optimizations to expose instruction-level parallelism (ILP) in the program. These compilation techniques are difficult to implement in a dynamic translation system, because they are often time and space intensive. ILP compilation algorithms usually take a lot of time and memory because the compiler needs to analyze the program to detect which instructions can be executed in parallel.
Consequently, the prior art translators do not utilize the ILP of the native system, and merely sequentially process the instructions, i.e. the instructions are ordered in the same order as in the original program. For example, assume that in an original program an add instruction is followed by subtract instruction, which in turn is followed by a load instruction. Thus, a prior art translator would translate these instructions in binary instructions for use on the native system, and would order the instructions as add, subtract, and load, even though the load could be speculatively performed prior to the add instruction.
Thus, the prior art surrenders potential performance gains from the parallel nature of the native system in order to maintain simplistic translator. By keeping the translator simple, the prior art avoids false traps or exceptions arising in the translated instructions. There are two types of exceptions: true and false. True exceptions are caused by the application, and must be handled in all cases. Not re-ordering the instruction stream simplifies recovery from true exceptions because the interrupted state always is sequentially updated. False exceptions, on the other hand, can only be the result of speculation. They do not normally occur if the program is not reordered. For example, in the code sequence:
1000: r1=r2+r3
1001: branch to 1003 if r4 is an invalid pointer
1002: r5=load memory from (r4)
1003: r6=r6xe2x88x921
Without reordering, 1002 would always access a valid pointer and would never trap. However, it is possible for an optimizer to speculate 1002 above 1001, which could sometimes cause a false exception. Moreover if a true trap does occur, by maintaining the original instruction sequence, then the state of the system appears as though all of the operations preceding the trapping instruction in the original program have completed, and all the operations following that instruction have never been executed. So, by preserving the original instruction order, then recovery from traps or exceptions is made relatively easier.
Thus, the prior art translators use of strict sequencing of instructions results in poor performance of translated programs on the native system.
Therefore, there is a need in the art for a translator that allows the translated instructions to be reordered in a different and more efficient way on the target architecture, and thus yields better performance, and yet can manage exceptions or traps.
These and other objects, features and technical advantages are achieved by an emulation system and method which allows translation with optimistically speculated operations. However, if an exception occurs, then an interpreter is invoked to return the system state to a point prior to the trap. The interpreter then executes the program sequentially without reordering.
At run-time, the inventive emulator analyzes branch behavior, selects code paths which are likely to be taken, and selects suitable legacy code for optimization. A translator then dynamically compiles those regions into native code and optimizes them using a fast ILP scheduling algorithm. The inventive emulator allows potentially trapping operations to be speculated above branches while preserving a legacy or original ISAs precise exceptions property. This property requires that if a legacy ISA instruction causes an exception, the interrupted architectural state must appear as though all preceding instructions in the program have completed and all following instructions never executed. The inventive translator uses a system of checkpoints and the ability to retranslate a block of code in order to recover from exceptions and restart the legacy application in a known state. A block is also retranslated with speculation disabled if it contains a speculative operation which traps too often. The code is scheduled in such a way that if an operation traps, the translator""s exception handler can always revert the legacy ISA state back to a safe checkpoint corresponding to a point in the legacy program""s execution history prior to the exception. The interpreter is then used to execute instructions between the checkpoint and the trapping operation. The second time the trapping operation raises the same exception, the legacy ISA state appears to have been sequentially updated and can safely be delivered to the application""s own handler.
Optimizing or reordering of instructions is very important to system performance. For example, suppose there are two instructions A and B, and B is a load from memory. Load instructions can require many cycles of clock time to complete, thus it is advantageous to perform the B load instruction prior to the A instruction to hide some, if not all, of the latency of the load instruction. This optimist or speculative performance of the load instruction, and other types of instructions, such as floating-point operations, will result in higher performance.
Note that the inventive emulator will allow the optimization where there is a change in the architecture or because the old, legacy code was not written well. More specifically, the legacy program could be an older program where instruction B is ordered after instruction A, but is not required to be so. Also, the native architecture may have more functional units, such as registers, processors, etc., which allows the execution operations in parallel. Thus, the inventive translator allows the use of features of the new, native architecture that the old, legacy system did not have.