1. Field of the Invention
This invention relates to an improvement in digital processors (the “host processors”) that dynamically translate instructions of a computer application program (the “target application”) designed for processing by a digital processor (the “target processor”) that functions with a different instruction set than the instruction set of the host processor, executing the translated instructions in real time to carry out the purpose of the target application, and, more particularly, relates to a new method and apparatus for processing of indirect branch instructions of the target application to reduce latency in processing by the host processor.
2. Related Art
A unique digital processing system is described in U.S. Pat. No. 6,031,992, granted Feb. 29, 2000, entitled Combining Hardware and Software to provide an Improved Microprocessor, assigned to Transmeta Corporation, (referred to as the '992 Transmeta patent), the content of which is incorporated by reference herein in its entirety. The Transmeta processor serves as the host processor capable of executing software programs, the target application, designed with an instruction set intended to run on a processor of different design, the “target” processor, that contains an instruction set unique to the target processor, but different from that of the host processor. The present invention improves upon the host processor and, hence, the host processing system.
The microprocessor of the '992 Transmeta patent is formed by a combination of a hardware processing portion (sometimes called a “morph host”), and a software portion, referred to as “code morphing software.” Among other things, the code morphing software carries out a significant portion of the functions of digital processors in software, reducing the hardware required for processing, and, hence, reducing power consumption. The morph host processor executes the code morphing software which translates the target application programs dynamically into host processor instructions that are able to accomplish the purpose of the original software. As the instructions are translated, they are stored in a translation buffer where they may be subsequently accessed and executed, as needed, during continued program execution without further translation.
A set of host registers (in addition to normal working registers) is included in the Transmeta processor. The host registers store “state” (also referred to as “context”) of the target processor which exists at the beginning of any sequence of target instructions being translated. In one embodiment, the results of translations are held in a gated store buffer until the translations execute. If the sequence of translated instructions execute without raising an exception, the results are stored in memory by a commit instruction. Further, the registers holding the target state are updated to the target state at the point at which the results from the sequence of translated instructions was committed. The information contained in those registers are used to advantage in the present invention as a “tag” for a translation.
The '992 Transmeta processor is capable of processing target applications programs designed for other processors. Application programs contain indirect branch instructions, in which the instruction execution requires the processor to “branch” to a specified address (in memory) and execute the instruction found at that address before returning to process the next instruction of the application program. When that branch address is not known, that is, is not included in the branch instruction, the branch instruction is referred to as “indirect”. The latter is the type of instruction with which the present invention is principally concerned. Thus, any reference herein to a branch instruction should be understood to refer to an indirect branch instruction, unless the text expressly states to the contrary.
Given that the branch address is not initially known, to complete execution of the branch instruction, the processor must first calculate or otherwise determine the unknown branch target address. The processor makes the calculation, determines the branch address, jumps to that address and executes the instruction found at that address.
In processors that include a memory “stack” and “call” and “return” instructions, the return instruction constitutes one important class of indirect branch instruction. The call instruction constitutes a kind of branch. To transfer the flow of the application program to the procedure, such as a subroutine, to which a jump is made, the target processor employs the CALL instruction. Then to return to the program following the execution of a branch instruction (and any other intervening instruction executions, as may include additional call and return instructions (called a nested branch), as example, that target processor employs the RETURN instruction.
When a CALL is made, the return address of the next instruction of the application program is saved in a memory stack (e.g. is “pushed” onto the stack) so that the flow of the program may continue later, when a RETURN instruction is executed. The RETURN instruction in turn “pops” the next instruction address of the target application off of the stack, and that succeeding instruction is then executed by the target processor (e.g. the processor jumps to that address and executes the instruction). That combination of software and hardware of the target processor reduces the latency in obtaining the next instruction of the program for execution.
When an indirect branch instruction of an application program intended for operation in a target type system is to be executed by the host Transmeta processor, in order to correctly translate that branch instruction into instructions of the host processor, the host must not only generate code to perform the effect of the branch instruction, but must also generate code to determine the address of the translation of the target of the branch. Thus in order for the host processor to execute the target branch instruction, the target program address and other target processor state information that was earlier saved by the host processor must be converted into the address of a corresponding translation followed by a transfer of control of the host processor to that translation.
A translation corresponds to a target address if the execution of the (machine language) code in the translation has the same effect on the state of the target processor stored in the context registers of the host processor as would be caused by a target processor executing that same target processor code. The host processor also associates additional information with each translation, called “tags”. One tag may contain information of the state of the target processor at the time the translation was made, as example, and other tags will contain other information, as later herein described. Those tags may be used to enable the processor to later identify (and, as appropriate, retrieve) the particular translation when again needed.
To find a pre-existing translation (e.g. host instruction) of an instruction address of the target processor, the host processor first searches (e.g. “looks”) through the translation memory, the library of translations stored in a memory earlier referred to, to find a translation whose tags match the current target state. As example, that memory may contain tens of thousands of translations. A conventional approach to efficient searching of the translation buffer is to establish an index of the stored information, known as a hash table, to make the search easier to accomplish. A hash table or “hashing” is the creation of an index to the table content that is derived from a transformation of the information stored. As example, see Schildt, “C: The Complete Reference”, third edition, Osborne-McGraw-Hill Ch 21 p 587 (1995). In practice one finds that searching a physical memory of the processing system in that way or any other way that requires searching through all translations is slower than desired because of the great number of system clock cycles required to accomplish the search and the volume of translations that is stored. Those familiar with the Transmeta processor refer to such a search as a slow look-up.
In other processing systems of the prior art a cache is used to hold data and/or instructions that are used frequently during the processing of an application program. By first looking for required data or instructions being sought in the cache, processing of the program being run proceeds more quickly should that information be found in the cache than when access must be made to the main memory for that information. Those prior caches may be software caches, hardware caches or combinations of the two types of caches. The present invention also takes advantage of a cache for translations of target application instructions, or more precisely, the address of such translations. The adaptation of a cache to the translation process of the host computer involves the application and caching of the translation “tags” required by the host processor, as becomes apparent from the detailed description of the invention which follows.
On inspection of the operation of the Transmeta processor, the skilled person finds that each translation of a target instruction is accompanied by four different pieces of information, referred to as tags. One tag is the extended instruction pointer (the “eip”) of the target application, which is the logical address of the target instruction contained in the target application. Another tag is the physical instruction pointer of the target application instruction (the “Phys-ip”), which is the physical address of such instruction (in a memory of a target processor). The Phys-ip value is derived from the logical address by a simple calculation made by the target processor and is the means of equating an address used by the software programmer with an actual physical location in memory of the target system.
A third tag is the “state” or “context” of the target processor being emulated by the host processor. As earlier noted, a number of working registers of the Transmeta (e.g., host) processor contain data indicative of the condition of the target processor, called state or context. That data provides a snapshot of the condition of the target processor. A more detailed description of context may be found in the co-pending application of D. Keppel, Ser. No. 09/417,981, filed Oct. 13, 1999, entitled Method and Apparatus for Maintaining Context While Executing Translated Instructions.
Prior to translation of a target instruction, the data in the foregoing working registers reflects the context of the target processor, as maintained by the host processor. When a target instruction is successfully translated and executed by the host processor, the data in those registers is updated as a side effect to the successful instruction execution. The data stored in the registers hence depicts the new context of the X86 processor. Among other things, that context information may be used by the host processor as a verification of the correctness of a translation during subsequent processing.
When a target instruction is successfully translated by the host processor during the processing of a target application, the translation is saved (stored) in a translation memory for re-use later during further processing of that application program. At the time the translation is made, the working registers of the host processor stores the assumed “state” or “context” of the target processor that is being dynamically translated by the host processor. That context information is saved along with the translation to ensure that the circumstances in the host processor are the same as before to ensure that the translation, if later accessed for use in processing, will correctly execute.
A fourth tag is the code segment limit of the target instruction (the “CS-limit”). The CS limit is an appendage to instructions found in the target application. The value specifies a maximum size of memory that the target instruction should not exceed and serves as a check on the integrity of the target instruction. Should an instruction exceed that size, an error condition results.
Accordingly, an object of the invention is to reduce latency in the dynamic translation by the host system of indirect branch instructions of a target application.
A further object of the invention is to permit existing translations of the instructions of a target application to be located as needed for the execution of a branch instruction more rapidly than before.