Virtualization management software enables multiple virtual machines to be executed on a single hardware computing platform and manages the allocation of computing resources to each virtual machine. A set of hardware computing platforms can be organized as a server cluster to provide computing resources for a data center. In addition, the virtualization management software can be configured to move virtual machines between servers (also referred to herein as “host systems” or “host computers”) in the cluster. An example of this supporting technology is sold as VMware VMotion® by VMware, Inc. of Palo Alto, Calif. An example of the virtualization management software is sold as VMware Distributed Resource Scheduler™ by VMware, Inc. of Palo Alto, Calif.
A cluster resource management service for a virtualized computing environment handles the placement and scheduling of a set of virtual machines (VMs) on a set of hosts that each belong to a cluster, in accordance with a set of constraints and objectives. To address constraint violations and achieve objectives, the cluster resource management service generates and can automatically execute migrations of VMs between hosts and can recommend powering hosts on or off. For a VM to be powered-on on a host within a cluster, the cluster needs to have sufficient computing resources compatible with the VM's execution constraints to meet the VM's admission control requirements, and those resources must be available in unfragmented form, i.e., all on a single host in the cluster.
Additionally, virtualized computing environments can implement a wide variety of redundancy techniques to establish a high availability system, or “HAS.” Such techniques set aside resources, such as hardware, systems, components, subsystems or even software, so that in the event of a failure, relevant functionality may be maintained or quickly recovered. Redundancy may be provided at any of a variety of levels. For example, conventional techniques for managing redundant information storage or transmission can use error correcting codes, cyclic redundancy checks, and/or storage array technologies such as RAID (“Reliable Array of Inexpensive Disks”). Also, redundant subsystems such as power supplies or storage controllers can be employed to improve system availability.
However, in most prior art systems, the resource scheduler operates independently of the HAS. Thus, in the event that a VM fails and is restarted by the HAS, the placement of the VM onto an alternate host is often not optimal because the HAS no global knowledge of the load of the cluster. Subsequently, the resource scheduler needs to rebalance the load with the VM running on the alternate host. This process is inefficient and can significantly degrade the efficacy of implementing the resource scheduler and/or HAS.