A networked virtualization environment includes several nodes (e.g., servers, data centers, etc.) that are in communication with each other, each node hosting several user virtual machines. The networked virtualization environment may also be referred to as a cluster of nodes. In order to maintain functionality of the networked virtualization environment/cluster of nodes, user virtual machines residing with the networked virtualization environment must be managed. Management of user virtual machines within the cluster includes tasks, such as for example, tracking and updating the state of the cluster, the user virtual machine VM inventory, the storage configuration of the cluster, and the network parameters for the user virtual machines.
Conventionally, management of virtual machines within the cluster is performed by a central management virtual machine or physical machine that resides at a node of the cluster. Each time a request is issued by a user virtual machine or an action performed by a user virtual machine in the cluster that requires access to virtual machine management data, the request must be handled by the central management virtual machine or physical machine. Although a shared/central database may be accessible to user virtual machines within the cluster for certain operations, the portion of the shared/central database corresponding to virtual machine management data is accessible only to the central management virtual machine.
Because all access to virtual machine management data for a cluster of nodes is provided by the central management virtual machine or physical machine, the central management virtual machine or physical machine acts as a central point of failure for all VM management related operations. Thus, whenever the central management virtual machine or physical machine fails or the node at which the central management virtual machine or physical machine resides fails, there is a period of time in which access to virtual machine management data is unavailable. Moreover, whenever the central management virtual machine or physical machine fails, there exists the possibility that some or all of the virtual machine management data may be lost or corrupted, requiring time and manual intervention to repair. During this down time, virtual machine management related operations are not adequately processed and errors and unintended behavior for the networked virtualization environment may arise.
Additionally, the central management virtual machine or physical machine may also act as a central point of bottleneck. As the cluster of nodes grows, and the number of user virtual machines within the cluster grows, the central management virtual machine or physical machine may run out of capacity for handling the task of managing virtual machines.