Users of obsolete mainframe computers running a proprietary operating system may have a very large investment in proprietary application software and, further, may be comfortable with using the application software because it has been developed and improved over a period of years, even decades, to achieve a very high degree of reliability and efficiency.
As manufacturers of very fast and powerful “commodity” processors continue to improve the capabilities of their products, it has become practical to emulate the proprietary hardware and operating systems of powerful older computers on platforms built using commodity processors such that the manufacturers of the older computers can provide new systems which allow their customers to continue to use their highly-regarded proprietary software on state-of-the-art new computer systems by emulating the older computer in software that runs on the new systems.
Accordingly, computer system manufacturers are developing such emulator systems for the users of their older systems, and the emulation process used by a given system manufacturer is itself subject to ongoing refinement and increases in efficiency and reliability.
Some historic computer systems now being emulated by software running on commodity processors have achieved performance which approximates or may even exceed that provided by legacy hardware system designs. An example of such hardware emulation is the Bull HN Information Systems (descended from General Electric Computer Department and Honeywell Information Systems) DPS 9000 system which is being emulated by a software package running on a Bull NovaScale system which is based upon an Intel Itanium 2 Central Processor Unit (CPU). The 64-bit Itanium processor is used to emulate the Bull DPS 9000 36-bit memory space and the GCOS 8 instruction set of the DPS 9000. Within the memory space of the emulator, the 36-bit word of the “target” DPS 9000 is stored right justified in the least significant 36 bits of the “host” (Itanium) 64-bit word. The upper 28 bits of the 64-bit word are typically zero for “legacy” code. Sometimes, certain specific bits in the upper 28 bits of the containing word are used as flags or for other temporary purposes, but in normal operation these bits are usually zero and in any case are always viewed by older programs in the “emulated” view of the world as being non-existent. That is, only the emulation program itself uses these bits.
In the development of the emulator system, careful attention is typically devoted to ensuring exact duplication of the legacy hardware behavior so that legacy application programs will run without change and even without recompilation. Exact duplication of legacy operation is highly desirable to accordingly achieve exactly equivalent results during execution.
In order to achieve performance in an emulated system that at least approximates that achieved by the legacy system hardware, or in more general terms, in order to maximize overall performance, it is necessary that the code that performs the emulation be very carefully designed and very “tightly” coded in order to minimize breaks and maximize performance. These considerations require careful attention to the lowest level design details of the host system hardware, that is, the hardware running the software that performs the emulation. It also requires employing as much parallelization of operations as possible.
An Intel Itanium series 64-bit CPU is an excellent exemplary platform for building a software emulator of a legacy instruction set because it offers hardware resources that enable a high degree of potential parallelism in the hardware pipeline of the Itanium CPU. The Itanium CPU also provides instructions that allow for fast decision making and guidance by the software as to the most likely path of program flow for a reduction in instruction fetch breaks and overall improved performance. In particular, the Itanium architecture provides instructions that allow preloading of a “branch register” which informs the hardware of the likely new path of the instructions to be executed, with the “branch” instruction itself actually happening later. This minimizes the CPU pipeline breaks that are characteristically caused by branch instructions, and allows for typically well predicted branch instructions to be processed efficiently without CPU pipeline breaks wasting cycles. The branch look-ahead hardware of the Itanium CPU, and in particular a specific mechanism for loading and then using a branch register, allows for the emulation software to achieve a higher degree of overlap and, as a result, higher performance in emulated legacy system instruction processing.
Reference may be taken to co-pending U.S. application Ser. No. 11/174,866 entitled “Lookahead Instruction Fetch Process for Improved Emulated Instruction Performance” by Russell W. Guenthner et al, filed Jun. 6, 2005, and assigned to the same Assignee as the present application for a more complete exposition of the advantages of selecting a host processor having the characteristics of the Intel Itanium series processors for emulating legacy software.
The development of software which provides for emulation of the legacy software instruction set on the host machine is complicated, and the requirements on performance are extreme. An approach which allows for ease of development and also provides the ultimate performance is to develop the code first in a high-level language, and then once the functionality and approach are precisely defined, to develop analogous code in assembly language. On the Itanium platform, writing a program in assembly language to achieve the highest level of performance is extremely complex and requires tremendous attention to detail. Accordingly the assembly code is prone to mistakes and subtle problems which are difficult to debug. It is thus advantageous during both the development, and during initial deployment of the emulation software to customers to provide a set of emulation software which is implemented in a high-level language such as C with such a version achieving a high level of reliability. It is also a requirement in order to achieve high performance that a version written in machine assembly language be provided. It may also be a requirement that different versions of the emulation machine be provided as the software is updated, debugged or enhanced. It may also be a requirement that different versions of the software emulation code provide different features to the customer which may be separately priced.
Providing multiple versions of a software emulator for a hardware system is not easily achieved. The supporting memory and I/O system requirements are large and expensive, and it may also not be possible to replicate an entire computer system for ease of testing simply because of the cost involved.
Performance requirements also place a restriction on the methodology for providing optional features or versions of the software. In a very tightly coded burst of assembly language there quite possibly will be no time or space for any sort of programmatic check for feature or version control. As an example, a tightly coded burst of code for emulating a specific emulated machine instruction may take as little as 9 cycles on the Itanium 2 Central Processor Unit, and in those nine cycles there are no slots or instructions available for checking for debug display options, or version control.