In a high availability (HA) cluster of virtual machine hosts, an HA agent running inside a host needs to know whether a given host is crashed or merely isolated when there is a network disconnection of the given host from the rest of the hosts in the cluster. If the host has crashed, the virtual machines running on the host are restarted on another host. This process is time consuming and does not maintain the state of the virtual machines. However, if the host has merely lost network connection, the virtual machines can continue to run on the isolated host until the network connection is re-established.
It is difficult to determine if a host is still running after a loss of network connectivity with other hosts. An HA agent may be used to check shared data store for the given host heartbeat after a loss of network connectivity. If the data store heartbeat is inactive, the host is presumed to be crashed. However, a host that is network isolated may also become disconnected from the shared data store. In such cases, the host may be considered failed when it is still operational and only temporarily network isolated, resulting in unnecessary virtual machine restarts and unnecessary consumption of cluster resources restarting virtual machines that were still running on their original host.