Many processors provide backward compatibility with legacy software. This is especially true of processors used as the central processing unit of computing devices (e.g., a processor belonging to the family of INTEL ARCHITECTURE (IA) processors). Backward compatibility may be provided via hardware support, or via instruction emulation. Hardware support has been accomplished via full hardware implementation of all instructions of a legacy instruction set architecture (ISA). Hardware support might make sense when the performance overhead of instruction emulation is relatively high compared with the performance of hardware support, and when the hardware platform uses a relatively small number of cores (e.g., 1 or 2).
However, as the number of cores in a processor increases, the investment in terms of semiconductor real estate can be high compared to the performance savings gained by hardware support for legacy ISA instructions. An example is the implementation of a full fledged x87 floating point instruction unit in an INTEL processor (of Intel Corporation of Santa Clara, Calif.) that uses a higher performance floating point instructions in SSE or SSE2. Implementing a full x87 unit requires a significant amount of die space for mostly a “just-in-case” backward compatibility scenario.
As the number of cores increases, designers that decide to implement hardware support for legacy instructions must decide whether to keep seldom-used hardware resources in each and every processor, or determine to share legacy hardware resources. Such a dilemma has risks—keeping hardware resources on every core generally appears to be wasteful, but a potential “glass jaw” scenario could result from having the cores share resource. That is, if a significant workload requiring legacy instruction support simultaneously loaded multiple cores, the need for shared hardware resources could result in a significant performance hit.
Software-based emulation of instructions can be implemented on each core rather than sharing or having hardware support units. Two well-known emulation techniques are interpretation and dynamic binary translation (DBT). Both techniques rely on “scan-before-execution” so they can find unsupported instructions before they are executed. If such instructions are found, the emulation system either changes the program control flow to an existing, appropriate emulation routine (interpretation) or dynamically generates an equivalent code and transfers program control to the newly created code (dynamic binary translation). However, interpretation has a relatively high performance overhead (several orders of magnitude) as compared to native implementation of an instruction, which has limited its practical application. DBT provides much less overall performance overhead (tens to hundreds of percentage slow down) as compared to interpretation when translated codes are executed frequently (e.g., “hot codes”), which effectively amortizes the translation overhead. However, the less frequently executed “cold codes” still pose a serious challenge for the DBT technique to reach the near-native performance level.
Another software-based emulation technique is an on-demand, illegal opcode exception, which can reduce the frequency of interpretation/translation dramatically. The on-demand technique is an exception-based on-demand interpretation, which configures an exception handler to redirect the exception of interest, on demand, to appropriate interpretation code. However, the exception technique suffers from the very high latency of the kernel-based exception handling mechanism, which requires the saving and restoring of state of the executing program to implement the exception handler. That is, the exception handling has required multiple processor mode changes and use of the kernel-level exception handler, which are generally too slow for frequent use.
Thus, hardware-based mechanisms suffer from cost and performance tradeoffs, and software-based mechanisms are not very effective in performance as compared to native implementation by the hardware.
Descriptions of certain details and implementations follow, including a description of the figures, which may depict some or all of the embodiments described below, as well as discussing other potential embodiments or implementations of the inventive concepts presented herein. An overview of embodiments of the invention is provided below, followed by a more detailed description with reference to the drawings.