Frequently, a plurality of distributed computer systems (or hosts) may be clustered or grouped together on any basis, such as where each of the systems is in a common location or is dedicated to performing a common function. For example, a number of servers may be clustered to support a particular web site (such as a web site hosted by an online marketplace), in order to ensure the reliability and dependability of any particular functions to be performed by the web site. Additionally, a number of scanning machines or devices may be used to confirm the arrival of large numbers of goods at a warehouse or shipping facility, while a collection of servers may be used to house and control the sending and receipt of electronic mail at an organization's various satellite offices.
Where computer systems are clustered or grouped together for any reason, it is usually very important to ensure that each of the computer systems in the cluster or group is operating in the same state or the same configuration, i.e., to ensure that each of the systems features the same core operating element (such as the same set of data, version of a software application, level of permissions or particular operational setting) as each of its peers. Various problems may ensue if one or more systems of a cluster or group operates in a different state or a different configuration from the other systems in the cluster or group. For example, if some of the hosts in a networked system operate using a different version of a software application (i.e., an older or less advanced version) than others, then the networked system itself is less than fully versatile, and certain instructions or messages may be handled by only a limited number of the hosts in the networked system. Moreover, where a state or configuration of one or more of the hosts in a networked system has changed for any reason (i.e., due to a fault or security breach at one or more of the respective hosts), such a change is difficult to identify or diagnose unless or until a discrepancy in the host's operations is detected through normal operations.
Determining whether a plurality of computer systems is operating in the same state or configuration, and enforcing a common state or configuration among a plurality of computer systems, are usually challenging tasks. Although the configurations of a plurality of systems within a particular fleet or class may be determined and modified as necessary through tooling processes, such processes require defining the state or configuration based on a known, verified master, and comparing each of the respective systems against the master at a predetermined time, in series in a centrally managed manner. Additionally, until the states or configurations of each of the respective systems within a fleet or class is reviewed in such a tooling process and modified or upgraded, as necessary, there is no way to know whether a state or a configuration of one of the respective systems within the fleet or class has changed. Unless tooling processes are persistently run on each of the respective systems, a change in a state or a configuration—which may have occurred due to a fault or security breach, or other potentially serious adverse conditions—may last for an extended period time, unbeknownst to the owners or operators of the plurality of systems.