Every processor implements a native instruction set architecture (ISA). An ISA is defined by features such as the semantics of operations, operation codes, instruction formats, endian-ness, memory architecture, and others. These features of ISAs vary from processor to processor. For instance, some ISAs may include operations that others do not. Some ISAs may have different word sizes. Some may lack certain registers that others have. There are many types of ISAs and their differences are well known.
There are problems that stem from the way that ISAs vary among processors. Most notably, compiled executable code is ISA-specific. That is, any given software is compiled only for a specific ISA. Any given compiled executable binary code will be unable to directly execute on chips whose ISA implementations are not compatible with the ISA for which the given code was compiled.
This mobility restriction on compiled code has long been known and several approaches have been used to enable cross-ISA execution. One approach has been the use of fat executable files. A fat executable file contains the binary code for two or more different ISAs. A fat executable contains two independent sets of executable code. When execution of a fat executable is requested, the loader loads the set of executable code that matches the ISA implementation of the machine on which it will execute. This approach requires additional storage. Moreover, a fat executable will still have considerably limited portability.
Another approach has been to use binary translation. Binary translation involves reading binary machine code compiled for one ISA (“source” ISA) and applying what are usually complex translation rules to the binary code to generate executable binary code for another ISA (“target” ISA). The translated binary code is executed on a processor that implements the target ISA. Binary translation can be performed in many ways. Translation can take place at runtime; source ISA code is translated (by translation instructions of the target ISA) to target ISA code when loaded. Such all-at-once translation can also be performed in advance and stored for use when needed, or it can be performed at load time (and perhaps cached for later re-use). In either case, the translated target ISA code is executed after translation is complete. In other cases, an emulator provides a virtual implementation of the source ISA (e.g., a virtual processor) and as the emulator receives source ISA code the emulator issues target ISA instructions that correspond to the source ISA instructions.
Depending on the particular instructions that predominate in source ISA code that is to be translated, most corresponding translated target ISA binary code will be less efficient than the equivalent source ISA code. In other words, given a source processor that implements a source ISA, and given a target processor of comparable performance that implements a target ISA, translated target ISA code will tend to take longer to execute on the second processor than the semantically equivalent source ISA code executing on the first processor. Although this is not always true, there is usually at least some variety of source ISA code that will execute significantly faster than its translated equivalent target ISA code.
There can be many reasons for translation performance degradation. Setting aside the processing needed to actually perform translation, translated code is translated from compiled binary executable code. Consider that a given unit of source code might be compiled for a first ISA and compiled for a second ISA, and the respective executables might be equally fast on their respective processors. However, a translation of the first executable to the second ISA may be much less efficient. Among other reasons, binary executable code lacks much of the information found in corresponding source code that can be used to optimize the code. Furthermore, code optimized for a source ISA may have been constructed in ways that work well with the source ISA but are ill-suited for the target ISA. Architectural differences may also result in some code for particular types of workloads being inefficient when translated. For instance, a source ISA might have an arithmetic logic unit (ALU) that implements a complex arithmetic operation in hardware. If a target ISA lacks that operation, translation to the target ISA will require inserting instructions to do the work of the single instruction of the source ISA. In general, any given compiled code will be likely to execute slower when binary-translated to another ISA.
Although translated binary code is known to be generally less efficient, only the inventors have appreciated that this can translate into user experience problems that can be solved. Most machines that execute translated code also executed non-translated code. If interactive applications of translated code and interactive applications of non-translated code are executed by a same user in a same session, the comparative speed differences of the translated and non-translated may be noticeable in respective graphical user interfaces (GUIs). Although a translated GUI might have acceptable responsiveness and be objectively useable, when experienced relative to a non-translated GUI, the slower speed of the translated GUI may stand out. Translated interactive applications may be perceived as slow when experienced in contrast to non-translated interactive applications. Described herein are techniques that can potentially smooth out the differences in speed and responsiveness of interactive translated code.
Techniques related to differential power/performance treatment for translated code are discussed below.