Modern microprocessors are at the heart of most computer systems. In general, these processors operate by receiving instructions and performing operations responsive to the instructions. For application programs and operating system (OS) activities, instructions may be received in a processor which then decodes these instructions into one or more smaller operations, often termed micro-instructions (uops), that are suitable for execution on the processor hardware. Some processors lack hardware features to directly perform certain instruction set architecture (ISA) instructions. It is for this reason that instructions are decoded into uops, which can be directly executed on the hardware.
An alternative implementation is to use a co-designed virtual machine (VM) where a layer of emulation software is designed in conjunction with the processor hardware. A co-designed VM implements a standard ISA referred to as a source ISA, for example the x86 ISA. Conventional software, including both the OS and application programs, is compiled to the source ISA. In turn, the hardware of a co-designed VM implements a target ISA designed specifically for a given hardware implementation with special performance and/or energy efficiency features. The target ISA is at the same level as uops and may be identical to the set of uops.
The emulation software belonging to the co-designed VM directs the execution of application/OS source ISA software either by interpreting it or by directly translating it into optimized sequences of target instructions. Such translation promises performance gains and/or improved energy efficiency.
The emulation process typically proceeds as follows. Interpretation is used for code (source ISA instructions) when it is first encountered. Then, as frequently executed code regions (hotspots) are discovered through dynamic profiling or some other means, they are translated to the target ISA. Optimization is often done as part of the translation process; code that is very heavily used may later be optimized even further. The translated regions of code are held in a translation cache so they can be re-used. The translation cache is managed by emulation software and is held in a section of memory that is concealed from all application/OS software. The application/OS software is held in conventional (visible) memory.
Previous processor implementations employing co-designed VMs employ full emulation, in which the emulation software emulates all application/OS software. One disadvantage of full emulation is that all code must first be interpreted and/or translated before it can be executed, and this may lead to low performance when a region of software is first encountered.