1. Field of the Invention
This invention relates to computer systems and, more particularly, to methods for increasing the efficiency of operation of a microprocessor which dynamically translates instructions from a target instruction set to a host instruction set and speculates on translated operations.
2. History of the Prior Art
Recently, a new microprocessor was developed which combines a simple but very fast host processor (called a xe2x80x9cmorph hostxe2x80x9d) and software (referred to as xe2x80x9ccode morphing softwarexe2x80x9d) to execute application programs designed for a xe2x80x9ctargetxe2x80x9d processor having an instruction set different than the instruction set of the morph host processor. The morph host processor executes the code morphing software to translate the application programs dynamically into morph host processor instructions able to accomplish the purpose of the original target software. As the target instructions are translated, the new host instructions are both executed and stored in a translation buffer where they may be accessed without further translation. Although the initial translation of a program is slow, once translated, many of the steps normally required for hardware to execute a program are eliminated. The new microprocessor has demonstrated that a simple fast processor designed to expend little power is able to execute translated xe2x80x9ctargetxe2x80x9d instructions at a rate equivalent to that of the xe2x80x9ctargetxe2x80x9d processor for which the programs were designed.
In order to be able to run programs designed for other processors at a rapid rate, the morph host processor includes a number of hardware enhancements. One of these enhancements is a gated store buffer which resides between the host processor and the translation buffer. A second enhancement is a set of host registers (in addition to normal working registers) which store known state of the target processor at the beginning of any sequence of target instructions being translated. Memory stores generated as sequences of morph host instructions are executed are placed in the gated store buffer. If the morph host instructions execute without raising an exception, the target state at the beginning of the sequence of instructions is updated to the target state at the point at which the sequence of translated instructions completed and the memory stores are committed to memory.
If an exception occurs during the execution of a sequence of host instructions, processing stops; and the entire operation may be returned to the beginning of the sequence of instructions at which known state of the target processor exists. This allows very rapid and accurate handling of exceptions, a result which has never been accomplished by the prior art.
It will be noted that the method by which the new microprocessor handles the execution of translations by placing the effects generated by execution in temporary storage until execution of the translation has been completed is effectively a very rapid method of speculation. The new microprocessor, in fact, uses the same circuitry for speculating on the outcome of other operations. For example, by temporarily holding the results of execution of instructions reordered by a software scheduler from naively translated instructions, more aggressive reordering may be accomplished than has been attempted by the prior art. When such a reordered sequence of instructions executes to produce a correct result, the memory stores resulting from execution of the reordered sequence may be committed to memory and target state may be updated. If the reordered sequence generates an exception while executing, then the state of the processor may be rolled back to target state at the beginning of the sequence and a more conservative approach taken in translating the sequence.
One of the most advantageous features of the new microprocessor is its ability to link together long sequences of translated instructions. Once short sequences of target instructions have been translated and found to execute without exception, it is possible to link large numbers of these short sequences together to form long sequences of instructions. This allows a translated program to be executed at great speed because the microprocessor need not go through all of the steps (such as looking up each of the shorter translated sequences) normally taken by hardware processors to execute instructions. Even more speed may be attained than might be expected because, once long sequences are linked, it is often possible for an optimizer to eliminate many of the steps from the long sequences without changing the results produced. ,Hardware, optimizers have never been able to handle sequences of instructions long enough to allow the patterns which allow significant optimization to become apparent.
A problem which has occurred with the new processor relates to sequences of instructions which are executed only an insignificant number of times. For example, instructions required to initiate operation of a particular application program are often executed only when the application is first called; and instructions required to terminate operation of an application are often executed only when the program is actually terminated. The original embodiment of the new processor typically treated all instructions in the same manner. It would decode a target instruction, generate the primitive host instructions which carry out the function for which the target instruction is designed, optimize the sequence of host instructions, and then store the translated and optimized instructions in the translation buffer. As the operation of the new processor proceeded, the sequences of translated instructions would be linked to one another and further optimized; and the longer sequences of linked instructions would be stored in the translation buffer. Ultimately, large blocks of translated instructions were stored as super-blocks of host instructions. When an exception occurred during execution of a particular host instruction or linked set of instructions, the new processor would go through the process of rolling back to the last correct state of the target processor and then provide single-step translations of the target instructions from the point of the last correct state to the point at which the exception again occurs. These translations would also be stored in the translation buffer. This embodiment of the new processor is described in detail in U.S. Pat. No. 5,832,205, Kelly et al., issued Nov. 3, 1998, and assigned to the assignee of the present invention.
Although this process creates code which executes rapidly, the process has a number of effects which limit the overall speed attainable and may cause other undesirable effects. First, the process requires a substantial amount of storage capacity for translated instructions. Many times a number of different translations exist for the same set of target instructions because the sequences were entered from different branches. Once stored, the translated instructions occupy the translation buffer until removed for some affirmative reason. Second, if a sequence of instructions is to be executed only a few times, the time required for translating and optimizing may be significantly greater than that needed to execute a step-by-step translation of the initial target instructions. The optimization of little used sequences of translated instructions tends to lower the average speed of the new processor.
For these reasons, the described embodiment of the new processor was modified to include as a part of the code morphing software, an interpreter which accomplishes step-by-step translation of each of the target instructions. An interpreter essentially fetches a target instruction, decodes the instruction, provides a host process to accomplish the purpose of the target instruction, and executes the host process. When it finishes interpreting and executing one target instruction, the state of the target processor is brought up to date; and the interpreter proceeds to the next target instruction. This process essentially single steps through the interpretation and execution of target instructions. The host instructions produced by the interpreter are not typically stored in the translation buffer so optimizing, linking, and the further optimizations available after linking are not carried out. The interpreter continues this process for the remainder of the sequence of target instructions.
It was determined that, in general, not until some number of executions of any sequence of instructions have occurred does the time required for all of the previous interpretations and executions become equal to the time required to translate and optimize the sequence. Consequently, a sequence of instructions which is little used during the execution of an application often executes more rapidly when it is simply interpreted rather than translated.
In order to make use of this advantage, the improved processor was modified to utilize the interpreter whenever a sequence of target instructions is first encountered. The interpreter software is associated with a counter which keeps track of the number of times sequences of target instructions are executed. The interpreter may be run each time the sequence is encountered until it has been executed some number of times without generating an exception. When sequences of target instructions have been interpreted and executed some selected number of times, the code morphing software switches from the interpreter to the translator and its attendant optimization and storage processes. When this occurs, a sufficient number of executions will have occurred that it is probable that further execution of the sequence of instructions will occur; and a stored optimized translation will provide significantly faster execution of the application as a whole.
When the code morphing software switches to the normal translation process, the translation is optimized and stored in the translation buffer. Thereafter, that translation may be further optimized and linked to other translations so that the very high speeds of execution realized from such processes may be obtained.
If the interpreter is utilized to collect statistics in addition to the number of times a particular sequence of instructions has been executed, additional significant advantages may be obtained. For example, if a sequence includes a branch, the address of the instruction to. which it branches may be recorded along with the number of times the branch has been executed. Then, when a number of sequential instructions are executed by the interpreter, a history of branching and branch addresses will have been established. These statistics may be utilized to speculate whether a particular sequence of instructions is probably going to become a super-block of translated instructions. After being interpreted for a selected number of times, the sequence may be translated, optimized, linked through the various branches without the necessity to go through a separate linking operation, and stored as such in the translation buffer. If the speculation turns out to be true, then significant time is saved in processing the instructions. If not, the operation simply causes an exception which returns the code morphing software to the interpreter.
Not only is the interpreter useful for generating host code for sequences which are used infrequently, it is also utilized in handling exceptions. Whenever the modified processor encounters a target exception while executing any translated target application, the code morphing software causes a rollback to occur to the last known correct state of the target processor. Then the code morphing software utilizes the interpreter rather than the translator to provide a new set of host instructions. The interpreter single steps through the generation and execution of target instructions, bringing target state up to date as each instruction is interpreted.
The interpreter continues this process through the sequence of target instructions until the exception again occurs. At this point, the state of the target processor is correct for the state of the interpretation so that the exception can be handled correctly and expeditiously. Because the interpretation process is so simple, the process of determining the point of occurrence of a target exception is significantly faster than the determination of such a point when carried out by the translation process which goes through the above-described translation and optimization process and then is stored in the translation buffer. Moreover, interpretation does not generate additional sequences of host instructions which are stored in the translation buffer and help to overfill that buffer.
By combining the interpreter with the optimizing translator which functions as a dynamic compiler of sequences of translated instructions, the code morphing software removes many of the limits to the upper speed of execution of target applications by the new processor. The use of the interpreter to handle early executions of sequences of instructions eliminates the need to optimize sequences of instructions which are little used during execution of the application and thereby increases the speed of operation. The elimination of the need to store these little used sequences of instructions ,in the translation buffer reduces the need for storage and eliminates the need for discarding many translated instructions. The use of the interpreter to handle exceptions produces the same useful effects as using the translator yet speeds operations and reduces storage requirements.
The improved embodiment of the new processor is described in detail in U.S. patent application Ser. No. 09/417,332, entitled Method For Integration Of Interpretation And Translation In A Microprocessor, R., Bedichek et al., filed on even date herewith, and assigned to the assignee of the present invention.
Recently, the processor has been further modified to enhance the utilization of the interpreter and translator. This has been accomplished by providing more than two modes of operation. Because it is often true that a translation once completed and optimized is not used extensively even though the sequence has been used frequently before translation, additional modes of operation are provided between the simple interpretation and the optimized translation. For example, a sequence of target instructions may be first executed by an interpreter for a number of times, then translated with minimal optimization and stored in the translation buffer. The translated sequence is operative and executes more rapidly than does interpretation; however, if sufficient time were allotted, it might be optimized to a much greater extent. To decide whether to further optimize the sequence by among other things linking it to other sequences, a second test may be run. This test essentially reviews the amount of time being spent in the interpretation or translation processes in order to determine the system is running efficiently.
This provision of multiple levels of translation with different tests for moving from one level to another significantly enhances the operation of the improved processor. This improved embodiment of the new processor is described in detail in U.S. patent application Ser. No. 09/417,979, entitled Method Of Changing Modes Of Code Generation, Torvalds et al., filed on even date herewith, and assigned to the assignee of the present invention.
Even though the various combinations of an interpreter and a translator greatly improve the operation of the unique microprocessor, some problems in operation remain. These problems may be generally described as an inability to utilize the various available functions optimally. One example of this occurs when a sequence of instructions which includes an internal branch is translated based on a presumption that a particular branch will be taken. When the sequence is translated, that branch may be consistently taken much more often than the other possible branch. However, as circumstances change, the other branch may be taken more often than the one for which the translation was optimized. This may occur because data controlling a branch may change or because a process used during startup or shutdown functions differently when used during normal operation of a program. In such a case, the original translation is still perfectly operative; but it is optimized to favor the wrong branch. Taking the branch other than the branch for which the sequence was optimized causes the translation to rollback and utilize the interpreter to provide a sequential set of host instructions. When this occurs more often than the translation is executed, overall processor speed is significantly reduced.
However, the translation remains in the translation buffer. The translation is still perfectly operative and may be used later in running the application. The translation buffer continues to store each new translation which executes correctly. After some number of new translations have been provided, the translation buffer tends to fill limiting the new translations which may be stored. This then slows the operation of the improved processor.
It is desirable to improve the operational speed of the improved microprocessor so that it executes more rapidly by modifying the processes for controlling the use of the interpreter and translator software of the code morphing software to make those processes responsive to changing conditions experienced during operation of the improved processor.
It is, therefore, an object of the present invention to provide a faster microprocessor compatible with and capable of running application programs and operating systems designed for other microprocessors at a faster rate.
This and other objects of the present invention are realized by a method for modifying operating conditions within a computer which translates instructions from a target instruction set to a host instruction set including the steps of monitoring an event occurring within a component of the computer, counting events occurring within a selected interval, generating an exception if a total of events within the selected. interval exceeds a prescribed limit, and responding to the exception by changing a translated sequence of host instructions.