Currently, as computer and network applications become increasingly widespread and business types in different fields become increasingly abundant, a highly available technology (i.e., a technology in which when a physical machine A has such problems as breakdown, a virtual machine running on the physical machine A can start up on a physical machine B without human intervention so that a continuous running of the virtual machine is ensured) of a virtual machine in distributed environment (virtual machine refers to a computer system which runs on a physical machine in a way of software simulation, has a complete hardware system function, and operates in a completely isolated environment) is becoming more and more important.
In existing technical solutions, typically, a high availability of virtual machine in a distributed environment is realized in the following manner: a logic group composed of a plurality of physical machines is defined as a highly available unit; in this way, when any physical machine in this logic group has a breakdown or other problems, all of the virtual machines running on this physical machine will start up on other physical machines in the same logic group. Moreover, a control node detects the states of physical machines in a way of heartbeating or regularly pinging physical machines, that is, when the control node cannot detect a certain physical machine, it is considered that this physical machine has a problem.
However, the existing technical solutions have the following problems: (1) after a virtual machine is allocated to a highly available group, whether the business running on this virtual machine is important or not, the virtual machine is acquiescently considered to have a high availability. Therefore, such design cannot ensure that a virtual machine which runs important businesses is activated preferentially, and some wastes and redundancies are also caused to resources; (2) since only the state of physical machine is detected, this detection method is simplex and one-sided, thus possibly causing erroneous determination (e.g., if the ping function is prohibited on a certain physical machine, it is possible to transfer the virtual machines on the physical machines that are running normally to another physical machine); (3) since the detection of the state of physical machine is initiated merely from the control node, the determination of the state of physical machine is not complete and accurate enough.
Therefore, there is a need to provide a virtual machine abnormity recovering method which can accurately determine and efficiently handle the faults of physical machines in a distributed environment.