Conventionally, in a computer system that realizes a mission-critical system having high sociality, as hardware, for example, in a maximum configuration, 128 CPUs, 512 G bytes of a maximum memory capacity, 128 hard disk drives of 73 G bytes, 320 PCI slots, and a maximum partition number of 15 are implemented to maximize the time limitations and throughput, thereby realizing extremely high processing performance, reliability, stability, and flexibility. For example, regarding the maximization of time limitations, the interior of a chassis is always monitored by many checkers, detected errors are automatically recovered by a data protection function such as ECC, system failure is avoided even in case of troubles by a dynamic degeneration function or redundancy mechanism by any possibility, and, furthermore, parts can be replaced without stopping the system since main components can be actively replaced. Regarding the maximization of the throughput, in order to adapt to change of transactions or the scale of operation, hardware resources are flexibly allocated by using a partition function and a dynamic reconfiguration function in combination so as to adapt to operations in which load is varied depending on time, for example, varied between day time and night time or end of a month and beginning of the month. In the partition function, a system board on which CPUs and memories are mounted is used as a unit, a plurality of partitions are set by combining one or plural system boards, and the interior of each system board is divided into partitions, for example, by two-CPU units, thereby realizing a flexible partition configuration and resource placement without being physically limited. The dynamic reconfiguration function enables addition and separation of CPUs, memories, and I/Os without stopping the system, thereby realizing addition of resources and replacement of parts of the system and flexible resource placement adapted to change of the amount of data or the amount of operations. In such a computer system realizing high reliability, high stability, and high flexibility, a system monitoring device (System Control Facility) that monitors and controls the entire system is provided. The system monitoring device is mounted on a dedicated board, retains user setting information, hardware state information, and OS software state information of the computer system so as to monitor and control the entire system, and gives a notification to outside when a malfunction occurs. When the system monitoring device of such a computer system fails, the system has to be stopped (power off) in order to subject the board to maintenance and replacement; however, depending on the operating mode of the system, active maintenance, in which maintenance and replacement is performed without stopping the system, has to be enabled.
Patent Document 1: Japanese Patent Application Laid-Open (kokai) No. H4-326843
Patent Document 2: Japanese Patent Application Laid-Open (kokai) No. H4-084230
However, when active replacement is performed without stopping the system upon failure of the system monitoring device, system state information that is retained merely in the system monitoring device is lost due to the active replacement, the system information before active replacement cannot be continued, and a trouble occurs in monitoring of the entire system, which is a problem. In the conventional system monitoring device, as the system state information for performing control of the entire system, user setting information, hardware state information, and software state information is retained. Among this, the user setting information is stored in a dedicated non-volatile memory (EEPROM) provided outside the board of the system monitoring device and can be restored even when the information in the board is lost due to active replacement. Also, since the system is not stopped, the hardware state information can be restored by reading the state information retained in the hardware side at the point when the active maintenance is completed. However, regarding the OS software state information which is a hardware control instruction of OS software, the state information is not retained at the OS software side; therefore, when the OS software state information is lost due to active replacement of the system monitoring device, it cannot be restored after the active replacement, and the OS software state information cannot be continued, which is a problem. Depending on the location, considerable time is taken in some cases until active maintenance is performed after the system monitoring device fails. If the system state information generated while the system monitoring device is stopped after the failure and until active replacement is finished, in other words, during the active replacement, is not restored, the continuity of the system state information cannot be ensured. In order to solve the problem due to failure of such a system monitoring device, there is a system having duplex system monitoring devices which is operated while the system state information is always synchronized between the two system monitoring devices. Therefore, even when the board is replaced for failure of one of the system monitoring devices, operation can be continued by using the state information stored in the other system monitoring device. However, even in the computer system having the duplex system monitoring devices, when both the system monitoring devices fail, there is a problem that the system has to be stopped in order to subject the system monitoring devices to maintenance and replacement as well as a computer system in which merely one system monitoring device is mounted, and, in addition, the state information may be lost.