Recently, a new microprocessor was developed which combines a simple but fast host processor (called a “morph host”) and software (referred to as “code morphing software”) to execute application programs designed for a processor (the “target processor”) having an instruction set different than the morph host processor. The morph host processor executes the code morphing software which translates the target programs dynamically into morph host processor instructions which are able to accomplish the purpose of the original target software. As the target instructions are translated, the new host instructions are both executed and stored in a translation buffer where they may be accessed without further translation. Although the initial translation of a program is slow, once translated, many of the steps normally required by prior art hardware to execute a program are eliminated. The new microprocessor has demonstrated that a simple fast low-powered processor is able to execute translated “target” instructions at a rate equivalent to that of the “target” processor for which the programs were designed.
In order to be able to execute programs designed for other processors at a rapid rate, the morph host processor includes a number of hardware enhancements. One of these enhancements is a gated store buffer which holds memory stores generated as sequences of morph host instructions are executed. A second enhancement is a set of host registers which store state of the target processor at the beginning of any sequence of target instructions being translated. If the translated morph host instructions execute without raising an exception, the target state at the beginning of the sequence of instructions is updated to the target state at the point at which the sequence completed and the memory stores are committed to memory.
If an exception occurs during the execution of the sequence of host instructions which have been translated, processing stops; and the entire operation may be returned or rolled back to the beginning of the sequence of target instructions at which known state of the target processor exists. This allows very rapid and accurate handling of exceptions while dynamically translating and executing instructions.
It will be noted that the method by which the new microprocessor handles the execution of translations by placing the effects in temporary storage until execution has been completed successfully is effectively a rapid method of speculating. The new microprocessor, in fact, uses the same circuitry for speculating on the outcome of other operations. For example, by temporarily holding the results of execution of sequences of instructions reordered by a software scheduler from naively translated sequences of instructions, more aggressive reordering may be accomplished than has been attempted by the prior art. When such a reordered sequence of instructions executes to produce a correct result, the memory stores resulting from execution of the reordered sequence may be committed to memory and target state may be updated. If the reordered sequence generates an exception while executing, then the state of the processor may be rolled back to target state at the beginning of the sequence and a more conservative approach taken in translating the sequence.
One of the most advantageous features of the new microprocessor is its ability to link together short sequences of target instructions which have been translated and found to execute without exception to form longer sequences of instructions. This allows a translated program to be executed at great speed because the microprocessor need not look up each of the shorter translated sequences or go through all of the steps normally taken by hardware processors to execute instructions. Even more speed may be attained than might be expected because, once long sequences are linked, it is often possible for an optimizer to eliminate many of the steps without changing the results produced. Hardware optimizers have never been able to optimize sequences of instructions long enough to allow the patterns which allow significant optimization to become apparent (such as loops).
The original method of speculation used by the new processor always updates the state of the target processor by committing stores to memory from the gated store buffer and transferring new state to target registers at the end of a sequence of instructions which had executed correctly and before any next sequence of target instructions was translated. This method of updating state is effective for many situations.
However, there are certain characteristics of the method which are less useful in certain circumstances. First, it is usually desirable to execute sequences which are as long as possible. In order to obtain long sequences of instructions between commits, it is often necessary to include one or more branch instructions which are not immediately followed by a commit instruction. This may occur when the code morphing software sees that a branch is taken almost all of the time and decides to treat the branch usually taken as the normal path for execution. The code morphing software speculates that this is the path which will be taken and omits a commit instruction after the branch instruction in order to provide longer sequences which may be further optimized. However, since there are no commit instructions immediately following each internal branch instruction at which the state of the target processor is committed before a branch is taken, if an exception occurs at some point during the execution of the sequence after an internal branch is taken, the operation of the machine must be rolled back to the beginning of the initial sequence preceding the branch instruction which is the last point at which correct state of the target processor exists. This may be quite time consuming.
Second, the original method of committing stores to memory while speculating on sequences of host instructions is useful because it is desirable to create translations which are as long as possible in order to accelerate execution. However, sequences of instructions can result from the taking of branches within sequences not followed by commit instructions which are so long that the number of memory stores are too great for the finite length of the gated store buffer used to accomplish speculation. This causes execution to halt, rollback to occur, and shorter sequences of instructions to be generated, a process which slows execution.
Another problem caused by this original method of committing stores to memory at the end of translated sequences occurs in cases in which all of the steps of some portion of the sequence must be completed sequentially without interruption in order that the result desired be produced. Input/output operations are often an example of such sequences. In such cases, it is typical to lock out interrupts until the sequence is finished. However, the lockout must be released at the end of the sequence in order to realize the full benefits of optimization. However, an unlock cannot occur without a commit because any asynchronous interrupt which was attempted during the locked stages of the sequence but was delayed by the lock would be generated and cause execution to be rolled back to the last known correct state. This could cause operations such as input/output to be repeated which could violate symantics of the system.
It is desirable to improve provide a new method of translating sequences of instructions by which the speed of the new microprocessor is maintained or increased.