Distributed computing systems include multiple components, also called “nodes,” each particular one of which typically includes information about its state, particularly including information for configuring that particular node. For example, each particular node typically includes information for initializing that particular node. When configuring a distributed computing system, each of the nodes requiring configuration information are involved in the process of configuration.
A first problem in the known art can occur when one of those multiple nodes is non-responsive. For just some examples, a node in a computer network might have crashed, might be powered-down, might be suspended (either by its operator or otherwise), might be too busy to timely respond, or might be unable to achieve connectivity with the rest of the distributed system. When this occurs, configuration of the distributed system waits for the non-responsive node to become responsive again. This can present one of several problems. First, making a configuration change made to the distributed system waits for that non-responsive node to become responsive again, possibly taking a very long time. Second, if a configuration change is needed to bring the non-responsive node back into the system, manual intervention might be needed, such as by an operator for the distributed system.
Known systems include making such configuration changes manually, that is, by having an operator change the configuration information in use by various nodes in the distributed system by editing configuration information at each node, or substituting new configuration information at each node. While these known systems might be able to adequately make configuration changes, they are subject to at least the following drawbacks. First, making configuration changes manually is relatively slow, at least in the sense that operator changes to the configuration information at each node involve relatively many operations to be performed by that operator. Second, making configuration changes manually is relatively subject to error, at least in the sense that it is relatively probable that a human operator will make incorrect changes, or will make changes which are inconsistent across the distributed system.
A second problem in the known art can occur when attempting to make more than one change to the configuration of the distributed system. For just one example, after making a first configuration change, an operator for the distributed system might desire to make a second configuration change. If each of the nodes requiring configuration information is involved in each configuration change, that operator must wait for the first configuration change to finish before starting the second configuration change.