1. Technical Field
The present invention relates generally to distributed computer networks and the management of such networks.
2. Description of the Related Art
Machines and collections of machines are defined by their state. A significant and complex portion of this state is the configuration that drives the behavior of the components running on those machines and interacting with those machines. A distributed computing environment typically implements techniques that attempt to provide a consistent mechanism for managing and changing both machine states and the configuration of machines and collections of machines.
A guiding principle is to drive the state and configuration of machines to a desired state. When possible, this should be done in a closed-loop fashion to avoid drift between machines and to maintain consistency across the network. For example, even if a machine misses a software install, the following install should bring it up to a consistent and correct state. Minimizing drift and maintaining consistency and repeatability is important for allowing a network with a large number of machines to be managed reliably and, optimally, preferably without or with minimal human intervention. If machine states are not reasonably consistent and predictable, assuring the quality of changes becomes an impossible task, as there may be a massive number of permutations to defend against.
Moreover, even when the network is not intentionally being kept in a heterogeneous state, there will always be some heterogeneity. Changes to the network are never atomic. It takes time for installs and configuration updates to propagate across the network. In many cases, this is intentional (when installs and configuration changes are staged) and in other cases it is a result of the need to coordinate changes so that only a portion of the network is undergoing an install at any one time. There are also straggler machines, often due to connectivity problems during an install or due to other install failures. There are other cases where multiple versions of some software are intentionally running on different parts of the network at the same time, sometimes for extended periods of time. Whatever the reason, the software and configuration state of the network can be assumed to be heterogeneous at any point in time. Different machines will not only have different states, they may also have different perceptions of the states of other machines. Having every machine constantly updating its configuration and state to reflect the heterogeneity is a very hard problem. It is desirable to track heterogeneity where necessary, but just as importantly, it is desirable to make sure that everything is robust enough to be tolerant of heterogeneity in cases where fully tracking and responding to it is not possible.
Consider the need to perform configuration and software installs across a large distributed computer network. In such networks, it is known to use an application (e.g. NetDeploy™ deployment utility) that involves having humans running a script that makes secure (e.g., SSH) connections to a specified list of machines, copies out a configuration file and an archive of software (e.g., a tarball, an archive of files created with the Unix® tar utility), and then invokes a host setup process on the machine to configure and install the software. This means that machines only change states when a human actively runs the install process to change the state of an individual machine or group of machines. Changing the network configuration or deploying new software involves running an install against all of the machines in the network. This is a time-intensive process for a human, and stragglers (machines that miss an install) will continue to run with old software and an old view of the world.
The present invention addresses these and other associated problems of the prior art.