Users of obsolete mainframe computers running a proprietary operating system may have a very large investment in proprietary application software and, further, may be comfortable with using the application software because it has been developed and improved over a period of years, even decades, to achieve a very high degree of reliability and efficiency.
As manufacturers of very fast and powerful “commodity” processors continue to improve the capabilities of their products, it has become practical to emulate the proprietary hardware and operating systems of powerful older computers on platforms built using commodity processors such that the manufacturers of the older computers can provide new systems which allow their customers to continue to use their highly-regarded proprietary software on state-of-the-art new computer systems by emulating the older computer in software that runs on the new systems.
Accordingly, computer system manufacturers are developing such emulator systems for the users of their older systems, and the emulation process used by a given system manufacturer is itself subject to ongoing refinement and increases in efficiency and reliability.
Some historic computer systems now being emulated by software running on commodity processors have achieved performance which approximates or may even exceed that provided by legacy hardware system designs. An example of such hardware emulation is the Bull HN Information Systems (descended from General Electric Computer Department and Honeywell Information Systems) DPS 9000 system which is being emulated by a software package running on a Bull NovaScale system which is based upon an Intel Itanium 2 Central Processor Unit (CPU). The 64-bit Itanium processor is used to emulate the Bull DPS 9000 36-bit memory space and the GCOS 8 instruction set of the DPS 9000. Within the memory space of the emulator, the 36-bit word of the “target” DPS 9000 is stored right justified in the least significant 36 bits of the “host” (Itanium) 64-bit word. The upper 28 bits of the 64-bit word are typically zero for “legacy” code. Sometimes, certain specific bits in the upper 28 bits of the containing word are used as flags or for other temporary purposes, but in normal operation these bits are usually zero and in any case are always viewed by older programs in the “emulated” view of the world as being non-existent. That is, only the emulation program itself uses these bits.
In the development of the emulator system, careful attention is typically devoted to ensuring exact duplication of the legacy hardware behavior so that legacy application programs will run without change and even without recompilation. Exact duplication of legacy operation is highly desirable to accordingly achieve exactly equivalent results during execution.
In order to achieve performance in an emulated system that at least approximates that achieved by the legacy system hardware, or in more general terms, in order to maximize overall performance, it is necessary that the code that performs the emulation be very carefully designed and very “tightly” coded in order to minimize breaks and maximize performance. These considerations require careful attention to the lowest level design details of the host system hardware, that is, the hardware running the software that performs the emulation. It also requires employing as much parallelization of operations as possible.
An Intel Itanium series 64-bit CPU is an excellent exemplary platform for building a software emulator of a legacy instruction set because it offers hardware resources that enable a high degree of potential parallelism in the hardware pipeline of the Itanium CPU. The Itanium CPU also provides instructions that allow for fast decision making and guidance by the software as to the most likely path of program flow for a reduction in instruction fetch breaks and overall improved performance. In particular, the Itanium architecture provides instructions that allow preloading of a “branch register” which informs the hardware of the likely new path of the instructions to be executed, with the “branch” instruction itself actually happening later. This minimizes the CPU pipeline breaks that are characteristically caused by branch instructions, and allows for typically well predicted branch instructions to be processed efficiently without CPU pipeline breaks wasting cycles. The branch look-ahead hardware of the Itanium CPU, and in particular a specific mechanism for loading and then using a branch register, allows for the emulation software to achieve a higher degree of overlap and, as a result, higher performance in emulated legacy system instruction processing.
Reference may be taken to U.S. application Ser. No. 11/174,866 entitled “Lookahead Instruction Fetch Process for Improved Emulated Instruction Performance” by Russell W. Guenthner et al, filed Jun. 6, 2005, and assigned to the same Assignee as the present application for a more complete exposition of the advantages of selecting a host processor having the characteristics of the Intel Itanium series processors for emulating legacy software.
The development of software which provides for emulation of the legacy software instruction set on the host machine is complicated, and the requirements on performance are extreme. An approach which allows for ease of development and also provides the ultimate performance is to develop the code first in a high-level language, and then once the functionality and approach are precisely defined, to develop analogous code in assembly language. Because of the complexity it is also probable that in a final product some of the source code will be in assembly and some will be in a more easily maintained and understood higher level language such as “C” or “C++”.
Two major requirements of the emulation software are 1) to achieve precise and exact emulation of the legacy instruction set, and 2) to achieve the highest possible performance. These two requirements are sometimes conflicting.
In any software emulation of hardware there are pieces of code which are concerned with checking for error conditions and exceptions. Since performance is critical the code must be carefully crafted to avoid “wasting” unnecessary time doing all the checks that the legacy system hardware might have done in parallel with other operations. Checking in software for the many exceptions that may have been detected by the legacy hardware takes extra code, is time-consuming and is a potentially significant detriment to performance.
The emulation software runs on a machine called the host system. The host system is itself a computer system which has its own exception and fault checking mechanisms built into the host system hardware and if used, also in the operating system of the host system. The exceptions and checks may be similar or quite different from the legacy system being emulated. These exceptions typically must be avoided by writing the emulation software so that it does not typically fault or do things which would cause system or application program errors.
If an error is detected by the host system hardware and operating system software there are typically two options for “ handling” the error condition. Typically, the application program is aborted. In more advanced systems, a mechanism commonly called a “ signal handler” may be invoked by a coordinated response of host system's operating system and the underlying hardware upon which it is running. In any operating system these pieces of code are typically quite machine dependent. The signal handler is code that is invoked on behalf of the application program when specifically selected hardware or system errors are detected. This gives the application programmer a chance to recover or process the host system detected errors in any desired way and is a much improved alternative to simply aborting the program.