This invention relates to the monitoring control for a virtual machine system and, particularly, to a method and apparatus for monitoring a virtual machine system suitable for automatically detecting abnormalities in the operating systems (OSs) which run in the virtual machine system.
The structure of the time of day TOD and the clock comparator and operation related to the set clock comparator SCKC and the diagnose DIAG instruction in a bare machine are described in publication: IBM System/370 Principle of Operation, GA22-7000. The structure and operation of the virtual machine are described in JP-A Nos. 56-33736 and 55-42326. A virtual computer system is used on a time sharing basis, and user-defined virtual hardware information relating to control registers, arithmetic registers and PSWs is set on the actual hardware system, and service of an actual computer is available for the user.
In a virtual computer system as shown in FIG. 1, a host or a bare machine 11 provides software with a bare machine interface 12 capable of handling process requests in priviledged or non-priviledged modes. A VM monitor 13 operates by using the bare machine interface 12 provided by the bare machine 11 and implements the simulation of priviledged instruction of virtual machines and the scheduling for virtual machines. The VM monitor 13 provides other VM interfaces 18 and 19 for allowing the operation of operating systems 16 and 17 running in respective virtual machines. Although only two virtual machines are shown in FIG. 1, more virtual machines generally exist in practical systems. The operating systems 16 and 17 run by using the VM interfaces 18 and 19 provided by virtual machines 14 and 15, and seem as if they operate in the bare machine 11 when seen from user programs 22-25. The operating systems 16 and 17 are capable of providing extended machine interfaces (EM interfaces) 18-21 for the user programs 22-25. Extended machines 26-29 have a function of executing in an operating system 16 or 17 specific interrupt processes such as for supervisor call from a user program which is run under the operating system and a processing function of machine instructions in a non-priviledged mode.
On this account, a plurality of operating systems can apparently run concurrently on a single bare machine, whereby the hardware resources can be used more efficiently and newly developed systems can be tested without suspending other services in operation.
For a virtual computer system in which a plurality of virtual machines are running in an online mode, it is necessary to enhance the reliability of operation by detecting abnormalities of operating systems immediately so as to minimize their influences. A bare machine is equipped with a control monitor (CM), which checks the diagnose instruction or command issued by the operating system at a constant interval and, if it does not receive the command for a certain time length, the monitor raises a machine check interrupt so as to detect an abnormality such as a failure of hardware or a "bug" in the operating system. However, only a single control monitor is equipped for the bare machine, and therefore it cannot monitor a plurality of operating systems running in the virtual machine system. Although it would be possible to monitor a specific operating system by providing the system with a control monitor dedicated to that operating system, such a control monitor fails to perform its inherent role of monitoring the bare machine and system abnormalities caused by bugs in the operating system and failures of related hardware devices will appear as a defective service for the user.