Memory degeneracy (or memory retirement) as used herein means that when a fault is detected (when a memory error has occurred) in part of a main memory (hereinafter, simply referred to as a “memory”), the corresponding part is removed from the usage target (removed from parts of the memory that may be used). By degenerating a memory, it is possible to avoid a situation of repeatedly accessing the position of the fault, and therefore the stability of the operation state of the computer is secured.
Conventionally, in a case where a memory error occurs in the kernel space, the degeneracy of the memory is performed, for example, when the firmware that is the monitoring mechanism that operates the monitoring mechanism in a processor other than the CPU (hereinafter, “system firmware”) activates the system (an OS (Operating System) and a software group that operates on the OS).
FIG. 1 is for describing an example of a process overview when a memory error occurs in a kernel space.
In FIG. 1, (1) illustrates a state where a memory error has occurred in the kernel space. In this case, the system firmware stores the memory fault information. The memory fault information is bitmap information in which a bit is assigned to each division unit obtained by dividing the memory area by a predetermined unit. That is to say, the memory fault information is bitmap information in which a flag indicating whether usage is possible is recorded in each division unit.
When the OS panics due to a memory error, and the system starts to reboot, the state shifts to that illustrated in (2). In (2), the system firmware degenerates the division unit where the memory error has occurred, based on the memory fault information.
Next, in (3), when the OS is rebooted, the OS operates without using the degenerated division unit. As a result, it is possible to avoid a panic caused by accessing the fault position again.
In another example, when an error occurs in the user space, the OS may degenerate the memory.
FIG. 2 is for describing an example of a process overview when a memory error occurs in a user space.
In FIG. 2, (1) illustrates a state where a memory error has occurred in the user space. In this case, the OS stores the memory fault information.
Next, as illustrated in (2), the OS degenerates the division unit where the memory error has occurred, based on the memory fault information. In this case the system does not need to be rebooted.
Meanwhile, by virtualization technology, it is possible to activate a plurality of virtual machines in a single computer. In such a virtualization environment, the system firmware is not involved with the activation or the rebooting of the virtual machines. Therefore, when the above mechanism is applied for degenerating the memory, a problem as illustrated in FIG. 3 arises.
FIG. 3 is for describing the problem when a memory error occurs in the virtualization environment. FIG. 3 illustrates an example where n number of virtual machines (VM) are operating:
(1) illustrates a state where a memory error has occurred in the kernel space on VM#2. In this case, as described in FIG. 1, the system firmware stores the memory fault information;
(2) illustrates a state where the OS of VM#2 is panicking due to the memory error. However, the VMs other than VM#2 may continue operating;
(3) illustrates a state where the OS of the VM#2 has started rebooting in response to the panic. In this case, the system firmware is not involved with the rebooting of the VM#2. This is because the activation of the VM is performed by a hypervisor. Therefore, it is not possible to degenerate the memory based on the memory fault information stored by the system firmware. As a result, the OS of the VM#2 panics again, and the state of (2) and the state of (3) are repeated.
Note that in FIG. 3, the memory is degenerated based on the memory fault information when the system (that is to say, all of the virtual machines and the hypervisor) is rebooted.
For example, Patent Document 1 discloses a method of degenerating a memory in a virtualization environment.
Patent Document 1: Japanese Laid-Open Patent Publication No. 2009-245216
Patent Document 2: Japanese Laid-Open Patent Publication No. 2009-230596
Patent Document 3: Japanese Laid-Open Patent Publication No. 2009-59121
However, as the technology described in Patent Document 1, when the memory is degenerated in the unit in which a memory is assigned to the virtual machine, the unit (size) of degenerating the memory depends on the unit in which a memory is assigned to the virtual machine. The unit of degenerating the memory means the size of the memory area degenerated due to a fault in the memory.
FIG. 4 illustrates a state where the unit of degenerating the memory depends on the unit in which a memory is assigned to the virtual machine:
(1) illustrates a state where a memory error has occurred in the kernel space on the VM#2;
(2) illustrates a state where the hypervisor is degenerating the memory. The unit of degenerating the memory is the unit in which a memory is assigned to the virtual machine, and therefore in the example of FIG. 4, the area assigned to the VM#2 is degenerated.
Therefore, when the unit of the memory assigned to the virtual machine is large, the unit of degenerating the memory becomes large, and therefore some normal areas of the memory are wasted.
When the unit in which a memory is assigned to the virtual machine is reduced in an attempt to avoid such circumstances, the larger the installation amount of the memory, the more the number of division units in the above-described state, which enlarges the amount of memory fault information to be managed by the hypervisor.
Furthermore, when the unit in which a memory is assigned to the virtual machine is made to have a variable length, a problem arises in that the control content implemented by the hypervisor becomes complex.
As described above, by the conventional technology, it is difficult to handle a large-scale virtualization environment.