Centralized management software for a virtualized computer system is used to monitor and balance loads across hardware resources, such as host systems and storage arrays. FIG. 1 depicts a conventional virtualized computer system 100 that includes a centralized virtual machine (VM) manager 102, host systems 122-124, and storage arrays 106-1, 106-N. One example of VM manager 102 is VMware vSphere® by VMware, Inc. of Palo Alto, Calif. As shown, host systems 122-124 are connected with storage arrays 106-1, 106-N via a storage network 104, and are also connected to VM manager 102 via network 120. VM manager 102 manages virtualized computer system 100 and is in communication with at least host systems 122-124 and storage arrays 106-1, 106-N. There may be any number of host systems included in virtualized computer system 100. Each host system may comprise a general purpose computer system having one or more applications, virtual machines, or other entities that access data stored in storage arrays 106-1, 106-N. For example, host systems 122-124 include VMs 125-127 that access VM data in the storage arrays 106-1, 106-N, respectively. Storage arrays 106-1, 106-N each include storage devices 112.
VM manager 102 is able to effectively manage the host systems and storage arrays when the number of host systems and/or storage arrays included in the virtualized computer system 100 is relatively, small on the order of dozens. However, when the number of host systems and/or storage arrays included in the virtualized computer system 100 becomes very large, the management of these hardware resources becomes quite inefficient. For example, a cloud-based computing system may include thousands of hardware resources that provide the physical infrastructure for a large number of different computing operations. In such cloud-based computing systems, proper load balancing across the hardware resources is critical to avoid computing bottlenecks that can result in serious problems, including a reduction in speed of VMs executing on a host system that is overloaded, potential data loss when no more free space is available in a storage array, and the like. Accordingly, the complexity and inefficiency of a centralized approach to load balancing does not perform well when a large number of hardware resources are being managed.
One approach to minimizing the problems associated with a centralized management approach involves increasing the hardware capabilities of VM manager 102, e.g., by executing VM manager 102 on more powerful hardware. However, even when executing VM manager 102 on very powerful hardware, communication delays and execution costs associated with hardware resources sending a large volume of statistics to VM manager 102, combined with the subsequent load balancing computation responsibilities thereof, still result in serious performance problems. In virtualized computing, a load balancing operation has a time complexity of roughly O (number of VMs*number of host systems). In a typical example involving approximately 64 host systems and 1,000 VMs, VM manager 102 can take up to several minutes to perform load balancing operations. Moreover, user-initiated operations such as VM power-ons are queued by VM manager 102 if they arrive during an in-progress load balancing operation, which presents unacceptable delays to the user when the load balancing operation takes a long time to complete (e.g., on the order of minutes). Furthermore, the centralized management approach continues to be vulnerable to VM manager 102 being a single point of failure.