Hosting services provide a means whereby multiple users can implement custom cloud resource configurations (e.g., cloud servers, cloud storage shares, load balancers, etc.) without the overhead costs associated with purchasing, upgrading, and maintaining the equipment needed to implement the configuration. In some cases, a hosting service provider maintains and provisions a grid of hardware nodes that are shared amongst the multiple users. More specifically, resources of a single node can be partitioned and each of these partitions can be allocated to host a cloud resource configuration of a different user.
Virtualization provides the means for partitioning the hardware resources amongst the multiple cloud resource configurations. Virtualization creates the façade that each cloud resource configuration is individually hosted on dedicated equipment with a particular set of resources. Two or more cloud resource configurations are provided non-conflicting sets of resources of the same hardware node such that a guaranteed amount of processing resources is available to each such configuration. In other words, a single physical resource is partitioned to operate as multiple logical resources.
The hosting service must continuously manage each node in the grid of hardware nodes (and specialized virtual machines for certain types of cloud resources) to verify that the hardware node has been configured according to the user's intended cloud resource configurations. Each time a user modifies or updates a cloud resource configuration, the hosting service needs to implement the same modifications or updates on the particular hardware node that is hosting the cloud resource configuration. Certain hosting services implement a centralized management paradigm for managing the grid of hardware nodes. That is, the hosting service includes a single centralized module that is responsible for managing the entire grid of hardware nodes. Using a centralized management paradigm to manage all of the cloud resources presents various problems. The centralized management paradigm is unable to operate during various common system failures (e.g., network failures, hardware node failures, etc.). For example, when deploying a cloud resource on a particular node, a network failure may cause the centralized module to deploy several instantiations of the same cloud resources on the node. Furthermore, there may be various artifacts of partially configured cloud resources left on the node due to these failures which interfere with the complete deployment of the cloud resource on the node. Thus, there is a need in the art for a method of managing a grid of hardware nodes of a hosting system to consistently reflect the user's intended cloud resource configurations and to operate successfully even during a system failure situation. These failure scenarios can result in a mismatch of the user's intended configuration or “administrative state” (i.e., what the world should be) and the target resource's actual configuration or “operational state” (i.e., what the world is).