1. Field of the Invention
The present invention relates to computer systems and more particularly to virtual machine systems which provide for the survival of a guest virtual machine when certain hardware and software failures occur.
2. Prior Art
The architecture of the IBM System/370-XA (370-XA mode) is described in publication SA22-7085. The IBM System/370-XA has evolved from the IBM System/370 Architecture (370 mode) which is described in publication GA22-7000. The System/370-XA can be implemented for a virtual machine (guest machine) by a virtual machine operating system (host system) implemented on a System/370-XA real machine (host machine). The System/370 architecture can also be implemented for a virtual machine (guest machine) of the host system.
Amdahl, Inc. was an early developer of a system recovery program, i.e. performance enhancement VM/PE. VM/PE is not capable of handling virtual machines and is an interface to the operating system.
IBM Technical Disclosure Bulletin, Vol. 25, .sctn. 113, April 1983, pp. 6278-6279 discloses a V=R virtual machine recovery program that is used with an IBM System/370 to preserve the status of a V=R guest across an abend. When a control program software failure occurs, this recovery program restores the V=R virtual machine environment when System/370 is re-initial program loaded (bootstrap). I/O interrupt data is saved in a data buffer rather than reflected to the virtual machine. The V=R user is again logged on such that its virtual machine environment is restored.
For host operating system to perform functions on behalf of a guest operating system, a start interpretive execution (SIE) instruction is implemented in the hardware. (The SIE instruction is described in U.S. Pat. No. 4,456,954, which issued on June 26, 1984 to the assignee of the present application, and is incorporated herein by reference.) The SIE instruction invokes interpretive execution hardware in the host machine in order for the host machine to enter the interpretive execution mode for the purpose of executing a program in a guest machine.
The host machine, while in an interpretive execution mode, performs the functions of a guest (an interpreted machine). The interpretive execution of one of several guest machines begins when the host system executes a start interpretive execution (SIE) instruction. The operand of the SIE instruction is referred to as the state descriptor. The state descriptor, which is located in real storage, includes parameters that describe the logical condition of the guest whose instructions are to be executed (interpreted). In particular, fields in the state descriptor specify the architecture of the guest, the contents of some of the program-addressable guest registers, the addresses of related control tables, the initial state of the guest CPU, and information about other aspects of the operation to include how host storage is to be used to represent guest main storage (guest storage mode).
Storage modes are specified in a field of the state descriptor. Guest main (absolute) storage is represented by host storage in either pageable storage mode or preferred storage mode. In the preferred storage mode, the preferred guest is assigned the lower portion (V=R region) of the host's absolute main storage beginning at host absolute address zero. In other words, a guest absolute storage address is treated as the corresponding host absolute storage address when preferred storage mode is specified in the guest state descriptor. This means that instruction and operand addresses in the preferred guest are treated directly as host absolute main storage addresses.
Reference and change preservation (RCP) is the SIE facility that manages the reference and change bits of a storage key byte associated with a page frame in host real storage. The SIE instruction, through the interpretive execution hardware, maintains storage key bytes in response to storage key manipulating instructions issued by a guest. SIE causes the interpretive execution hardware to interpret these storage key manipulating instructions thereby using the RCP facility in the interpretation.
In the case of a preferred (V=R) guest, interpretive execution hardware provides for the preferred guest to manipulate the actual storage key bytes. The RCP facility is not used since the storage key bytes are owned by the preferred guest, i.e. the page frames are used solely by the preferred guest and are not paged by the host.
When a host machine supports a virtual machine environment, the host machine's system control program is capable of clearing channel paths on reset. Such an incident of clearing channel paths would prevent the recovery or survival of a guest following a system incident. (System incidents are described below.) For example, a problem arises when the host machine shares a control unit with a virtual I/O device and there is no other channel path to this device and a contingent connection (uncleared control unit check) occurs at a system incident or failure. When all three conditions occur simultaneously, the operator must perform an IPL or a SYSTEM RESET operation either of which will destroy the preferred guest. (Of course, if any of the above conditions do not occur, the host control program is able to retain guest I/O status including interrupts and reserves.
An object of this invention is to assign a reserved area of host main storage to a virtual machine to contain its status and in-progress work which won't be cleared due to a system failure and which can be used for the recovery of a virtual machine, i.e. restoration of the virtual machine operating system, once the host operating system has been refreshed.
Another object of this invention is to include control blocks within the assigned area of host main storage to be used for preserving the status of a virtual machine and for returning the virtual machine to its operational state, i.e. maintaining guest operating system integrity, following a system failure.
An object of this invention is for all (free) storage requests, made on behalf of the virtual machine while the system is running, to be obtained from a reserved area of storage.
An object of this invention is to restore the I/O to substantially the same configuration that the virtual machine was using prior to a system incident once the host operating system was refreshed (bounce completed) and the virtual machine has survived.
It is also an object of this invention to identify the real I/O devices to the recoverable virtual machine following a system failure.
Another object of the invention is to provide for automatic recovery of a virtual machine following a system failure.
A further object of this invention is to provide virtual machine recovery for substantially any operating system that will support recovery on the architected hardware.
The following publications contain the background information for the invention disclosed and claimed herein and are incorporated herein by reference.
1. IBM publication SA22-7095 (file number S370-01) entitled "IBM System/370 Extended Architecture --Interpretive Execution".
2. Gum, P. H., "System/370 Extended Architecture: Facilities for Virtual Machines", IBM Journal of Research and Development, Vol. 27, No. 6, November 1983.