The present invention relates to execution of memory dump.
In an ordinary computer system, an Operating System (OS) controls hardware such as, for example, an instruction execution unit (a Central Processing Unit (CPU), a Micro Processing Unit (MPU), or the like), main storage (hereinafter, referred to as a memory), secondary storage, input-output units, a file device, a communication unit, and the like, and for controlling a schedule for using hardware. Further, the OS provides a software interface for a user to use the computer easily. For example, an application program such as a spread sheet program or a word processing program uses the computer hardware through the control of OS.
Sometimes, the OS hangs-up or malfunctions owing to a fault caused by hardware failure or program trouble (which may be OS itself). However, where a high degree of reliability and availability of a computer system are required, as in a computer system used as a backbone system for example, faults that may cause a hang-up or malfunction need to be avoided or quickly resolved.
As a technique for meeting the above demand, a memory dump may be performed when a fault in a computer system makes it difficult to continue running the OS. Here, “memory dump” means that information stored in an instruction execution unit, a memory, and the like, of the computer system at the time of occurrence of the fault in the computer system is saved as fault information, in secondary storage. An administrator or the like analyzes the contents of the memory dump to specify and correct the cause of the fault and to restart the computer system.
On the other hand, there are cases where a plurality of OSs runs on one computer system. Technologies for running a plurality of OSs on one computer system, includes a virtual computer system, Logical Partitioning (LPAR), and the like, for example. In these technologies, an instruction execution unit is time-shared to perform processes of a plurality of OSs in parallel so that the processes of the plurality of OSs is performed on one computer.
Patent Document 1 describes a technique of performing a memory dump with respect to a faulty OS when a fault occurs in a system that performs processes of a plurality of OSs on one computer. According to the technique described in the above document, information in a data area used by a hypervisor (i.e. software for providing a logical partitioning function of a computer in system partitioning) is acquired. The information acquired at the time of occurrence of fault is used for analyzing the fault.
Further, Patent Document 2 describes a technique in which, when a fault occurs in an OS on a virtual computer, a memory dump of the faulty OS is performed using, for example, another OS having a dump function, and the memory dump is stored in a form that can be analyzed by a system administrator or a debugger, in secondary storage.
Patent Document 1: U.S. Pat. No. 6,892,383
Patent Document 2: Japanese Non-examined Patent Laid-Open No. 2005-122334
The techniques disclosed in Patent Documents 1 and 2 perform a memory dump of only an OS in which a fault occurs. However, in the case of a system that provides a shared file system or a cluster system for a plurality of OSs, a fundamental cause of system fault does not necessarily lie in the OS in which the fault occurs.
The present invention has been made considering the above-described situation, and an object of the invention is to provide a technique that can specify a fundamental cause of a system fault in a computer system performing processes of a plurality of OSs.