FIG. 1A shows a number of elements comprising a fault tolerant computer system 100. This system 100 has two host computational devices (i.e., servers), Host.1 and Host.2, each of which can support one or more pairs of virtual machines (not shown), each pair of which is configured to perform the same set of operations (Application) at substantially the same time, and both of which are configured to operate together such that no single point of error (Communication Link Fault for instance) in the system can result in one or the other of the virtual machines becoming unavailable to run the application. Applications running in systems such as system 100 are typically referred to as fault tolerant applications. Each of the host computers Host.1 and Host.2 are connected over a plurality of dedicated network links or paths to each other, each of the host computers are connected over virtual network paths or links to two different logical storage containers, I/O and 120 respectively, and both of the host computers are in communication over the virtual private network (VPN) or public network links with a Quorum computer Q.
During normal fault tolerant operation of the system 100, a pair of virtual machines (VMs) running on Host.1 and Host.2 can both operate in an active, on-line mode or state, with one VM operating as an active master and the other device operating as an active slave. In the event it is determined that the state of the operational health of one or the other of the VMs is poor, and so one of the VMs is no longer able to operate in a fault-tolerant mode (active mode), or is at risk of not being able operate in a fault tolerant mode, the operational state of the VM that is identified to be in poor health (faulty) can be downgraded to be either offline or online, and although the application running on the host devices is still available to a user, it is no longer fault tolerant. Functionality running in association with each of the virtual machines can monitor and quantify the health of a plurality of substantially immediately detectible operating characteristics associated with each VM. From one perspective, and in the context of a fault tolerant computer system, an immediately detectible operating characteristic can be characterised as one that is detected in less than the time it takes to detect that communication is lost between a pair of virtual machines, or less than the time it take to detect that the virtual machine has lost communication with functionality (internal or external to the computer system) that the fault tolerant computer system relies upon to provide fault tolerant services. This functionality can monitor the health of one or more virtual machines running on each host, it can monitor the health of the network to which each VM is connected, and it can monitor connectivity between each VM and a Quorum server, and it can monitor the health of operational characteristics associated with I/O devices (physical or virtual containers) connected to the network with which the VMs communicate and that are considered to be essential to the fault tolerant operation of the system 100. All of these operational characteristics can be considered individually or as a whole in order to determine the state of the health of each virtual machine running on both of the host devices, and depending upon the state of the health (normal or poor health) of each virtual machine, a decision can be made to downgrade the operational state of any one or more of the virtual machines.