The computer system that needs high reliability avoids system down by making Central Processing Units (CPUs), memories, and disks redundant. In the event of a failure in such a system, the system isolates the device having the failure and restarts to regain the operation (the isolation is called “fallback” or “degrading”). If a failure occurs in a CPU or a memory, the failed CPU or memory is degraded.
In particular, disks and memories, which frequently have failures, need to be redundant.
A memory capable of memory mirroring has been known as one method of making redundant. Such a memory can provide two times memory size when not being subjected to memory mirroring.
A method of reducing a memory size to be reduced in the event of a failure in a memory has been known.
In addition, a technique to incorporate a memory controller into a CPU has been recently proposed and such CPUs having therein memory controllers have been spread.
Advantageously, such a CPU can escape from lowering its performance due to bottleneck of a bus as compared with a conventional structure that connects a memory controller to the CPU via a common bus. Besides, such a system can provide a memory performance the same in extent as a scheme of connecting a memory with a CPU via an expensive cross bus switch that has conventionally applied to a main frame and UNIX (registered trademark) computer with a lower cost. For this reason, inexpensive PC servers have mainly adopted such a CPU.
A memory controller incorporated in such a CPU is equipped with a function of the above memory mirroring. Thus, the mirroring function of a memory controller incorporated in a CPU comes to be used in systems that needs high reliability.
Patent Literature 1: Japanese Laid-open Patent Publication No. SHO 57-074898
Patent Literature 2: Japanese Laid-open Patent Publication No. HEI 11-312120
In a multi-CPU system incorporating therein a memory controller, the memory connected to a CPU can read and write data only through the CPU. For the above, in the event of fallback of the CPU, the remaining CPUs come incapable of accessing the memories subordinate to the fallback CPU, so that the subordinate memories are also degraded.
Generally, an application such as a database expands part of data in memories and indexes the memories to speed up the processing. Therefore, when a predetermined memory amount is not ensured due to memory fallback, the throughput of the application largely lowers. If the worst happens, the startup of the application fails.
In a virtualization technique, which has recently been widespread, each virtualized guest operating system (OS) unit needs a predetermined amount of memory. Fallback of a memory reduces the number of operable guest OSs. For example, when a system having a three layers of Web, an application (AP), and a database (DB) is formed of multiple guest OSs, memory fallback disables the guest OSs from starting, so that the system does not operate. Likewise, memory fallback may disable a virtual PC that virtualizes clients from operating the needed number of PCs, which may interfere with the business operation.
For example, the above memory mirroring does not determine whether the memory is duplexed when a failure occurs in the CPU, and is therefore incapable of compensating for the memory reducing.
A typical computer system has a function of memory redundancy, CPU isolating, and automatic restart to improve the reliability of the system. However, simple isolation of the CPU, the memories subordinate to the isolated CPU are also isolated, so that an available memory size is reduced.
Simply canceling the memory mirroring in order to compensate for memory fallback when a failure occurs in a CPU may sometimes excessively increase available memories, which bloats the memory management table or the like. In some OSs and applications, the bloated setting remains after the system recovery, the available memory region is reduced, which may make it impossible to start the application. If the worst happens, the OS and the application need re-installation.
For example, although the method described above can reduce a memory size to be reduced when a failure occurs in a memory, the method unfortunately has no solution to increase in memory size due to cancellation of memory mirroring.