The present invention relates to a method of acquiring dump information of a computer system necessary for investigating the cause of fault of the computer system.
With recent improvement on a processor performance, the process amount of a program to be executed by one computer and the amount of data to be processed show a steady increase. In order for one computer system to efficiently process data, it is necessary to store a large amount of data directly in a main memory or to develop a large amount of data into a virtual memory. In order to realize this, it is necessary to expand the address spaces of the main memory and virtual memory.
Such expansion of the address spaces of the main and virtual memories increases the amount of information to be acquired to investigate the cause of fault of a system (such information is described hereinafter as dump information).
As the dump information increases, a time taken to acquire the dump information prolongs. This causes a delay of a start time when the system with fault starts again. Therefore, restart of processes is eventually delayed.
Techniques of avoiding a restart delay of a system with fault are known disclosed, for example, in JP-A-7-234808 and JP-A-10-333944.
JP-A-7-234808 discloses a method of acquiring dump information of a computer system having duplicated main memories. According to this disclosed method, when fault occurs in the system, data held in one of the duplicated main memories is used as the dump information, and the system is restarted by using the other main memory to avoid a delay of the restart of the system.
According to the technique disclosed in JP-A-10-333944, prior to restart of the system, dump information in a memory area where a kernel of an operating system is loaded is acquired. Thereafter, a program for restarting the system and a program for sequentially acquiring dump information from another memory area from which the dump information is still not acquired, are executed in parallel. If the dump information in a memory area necessary for system restart is not still acquired, the program for restarting the system restarts the system while dumping the dump information held in the memory area before using. In this manner, the technique disclosed in JP-A-10-333944 shortens the time taken to restart the system.
According to the above-described techniques, data stored in a main memory is used as dump information when fault occurs. However, if fault occurs in a computer system adopting virtual memory management, it becomes necessary in some cases to acquire dump information paged out to a paging device unit.
In the computer system adopting virtual memory management, in some cases, data in some virtual memory area is output (page-out) to an external storage unit, and when data in the virtual memory area is referred to or updated, the virtual memory area is accessed after the data is input (page-in) to a main memory. The above-described techniques do not consider acquisition of dump information which was paged out and is not stored in the main memory as yet.
Namely, with the above-described techniques, when fault occurs in a computer system, it is not possible to restart the system until the whole dump information paged out in an external storage unit is acquired.
It is an object of the present invention to shorten a time taken to restart a computer system and resume processing jobs, if data in the whole virtual memory area is to be acquired when fault occurs in the system.
In order to achieve the above object of the invention, in a computer system having a processing unit, a computer with a main memory connected to the processing unit, and external storage units connected to the computer, the computer system utilizing virtual memory management realized by mapping an area of a virtual memory used by the processing unit onto the main memory, in response to a fault occurrence of the computer system, data held in the main memory is output to a dump file. Thereafter, the external storage unit which holds the contents of a page of the virtual memory to be paged out from the main memory, is changed from a first external storage unit used before the fault occurs to a second external storage unit. The computer system is restarted by using the second external storage unit as the paging device unit.
According to another aspect of the present invention, a computer system is provided which comprises a processing unit; a main memory; first and second external storage units having an area of a virtual memory used by a program to be executed by the processing unit, the first and second external storage units being used as a paging device unit for holding the contents of the virtual memory area, the contents being paged out from the main memory; and a switching unit interposed between the processing unit and the first and second external storage units, the switching unit accessing one of the first and second external storage units as the paging device unit in response to an access request for the paging device unit issued upon paging of the virtual memory by the program executed by the processing unit, and switching the first and second external storage units as the paging device unit from the one of the first and second external storage units to the other of the first and second external storage units in response to a fault occurrence of a process under execution by the processing unit.