This invention relates to emulated computer systems; and more particularly, it relates to methods of diagnosing faults in such systems.
As background to the invention, FIG. 1 shows an example of an emulated computer system. Included as hardware in this FIG. 1 system is an x86 instruction processor 10, a main memory 11, an I/O controller 12, a disc storage unit 13, and an operator console 14. The x86 instruction processor is an Intel 386 processor, or an Intel 486 processor, or an Intel Pentium processor, an Intel Merced processor, or any other processor which has a compatible set of object code instructions. The main memory 11 is any memory which is made of integrated circuit chips.
Included as software in the FIG. 1 system is a native operating system 20 and multiple native user programs 21a-21m. This operating system 20 and all of the programs 21a-21m are native to the x86 instruction processor 10 because they are a compilation of object code instructions that are executed directly by the x86 instruction processor. A primary example of the native operating system 20 is any NT operating system from Microsoft Corporation, such as 4.0 Workstation or 4.0 Server.
Also included as software in the FIG. 1 system is an A-Series operating system 30 and multiple A-Series user programs 31a-31n. This operating system 30 and all of the programs 31a-31n are foreign to the x86 instruction processor 10 because they are a compilation of A-Series object code instructions which can be executed directly by an A-Series instruction processor. Examples of an A-Series instruction processor include the Unisys A7 processor, the Unisys A11 processor, the Unisys A16 processor, and any other processor which has a compatible set of object code instructions.
All of the NT user programs 21a-21m are executed under control of the NT operating system 20. One of these NT user programs 21k is an emulator program which interprets each of the A-Series object code instructions that are in the A-Series operating system 30 and A-Series user programs 31a-31m. These A-Series user programs 31a-31m are executed under control of the A-Series operating system 30.
For simplicity in FIG. 1, both of the operating systems 20 and 30, and all of the user programs 21a-21m and 31a-31n, are shown as residing in their entirety in the main memory 11. However, in reality, these operating systems and user programs are not all present in the main memory 11 at the same time. Instead, the operating systems and user programs are stored in their entirety in the disk storage unit 13, and the NT operating system 20 uses paging to retrieve portions of the NT user programs and A-Series operating system and A-Series user programs and store them in the main memory 11 as they are executed.
One particular characteristic of the NT operating system 20 which is especially relevant to the present invention is that it includes a subprogram 20-1, called an "NT Trap Handler," which generates a dump of the main memory 11 only when a "kernel stop error" occurs. This kernel stop error is a fatal error in that it causes the NT operating system 20 to stop running. In response, all of the NT user programs 21a-21m which are executed under control of the NT operating system also stop running; and thus, all of emulated A-Series programs 30 and 31a-31n also stop running.
An example of what causes a kernel stop error to occur is as follows. Suppose the NT operating system is attempting to execute an I/O command for an A-Series user program, such as program 31a, which calls for a block of data to be read from the main memory 11 and written onto the disk storage unit 13. If during the execution of that I/O command the page tables for the NT operating system somehow get corrupted to indicate that a page of the data block doesn't exist in the main memory 11, then a kernel stop error will occur.
All of the causes of the kernel stop error are pre-defined by the NT operating system. Consequently, the kernel stop error and its resulting main memory dump are too limited to be useful for debugging the A-Series programs 30 and 31a-31n. What is needed for debugging the A-Series programs is the ability to allow an A-Series programmer to request a memory dump on the occurrence of any event which is selectable by the programmer. Such an ability would enable the programmer to obtain a memory dump when events occur that identify software problems in the A-Series programs but are unrelated to kernel stop errors.
Another drawback of the NT operating system 20 is that even when the kernel stop error does occur, the resulting memory dump that is generated by the NT Trap Handler 20-1 in the NT operating system still has several deficiencies. First, the NT Trap Handler 20-1 dumps only the main memory 11, but that is not a dump of the entire emulated A-Series memory because NT uses paging. Thus, only those parts of the emulated A-Series memory which were recently used will be stored in the main memory 11. Second, portions of the main memory 11 which are dumped will be storing NT user programs and the NT operating system; and, they are irrelevant to detecting errors in the A-Series programs. Third, the main memory 11 is dumped in an NT format which is totally different than an A-Series format. For example, each word of memory in an NT format has thirty-two bits; whereas each word of memory in A-Series format has fifty-two bits.
Still another drawback of the NT operating system 20 is that after the kernel stop error occurs, the operating system needs to be rebooted in order to again become operational. This rebooting is required because when the kernel stop error occurs, the NT operating system 20 performs the main memory dump by writing the content of main memory 11 into the page file on the disk storage unit 13. Later, when the operating system is rebooted, that memory dump is copied into another file so that the page file can be used again.
Such rebooting is suitable for a stand-alone personal computer, but it is not acceptable when the A-Series programs that are being emulated are for a large scale computer which provides services to the terminals of hundreds of customers. For that case, what is needed is the ability to take a memory dump quickly, and quickly restore the system to full operation so that customer down time and inconvenience is minimized.
Accordingly, a primary object of the present invention is to provide an improved method of diagnosing faults in an emulated computer system by which the above problems are overcome.