As computing devices become more powerful, containing increased storage capacity and processing capabilities, the average user consumes an increasingly smaller percentage of those resources in performing everyday tasks. Thus, many of today's personal computing devices are often not used to their full potential because their computing abilities greatly exceed the demands most user's place upon them. An increasingly popular method of deriving use and value from the unused resources of powerful computing devices is a distributed computing system in which the computing devices act in coordination with one another to perform tasks and maintain data.
A distributed computing system can utilize a number of interconnected computing devices to achieve the performance and storage capabilities of a larger, more expensive computing device. Thus, while each computing device may have only a few gigabytes of useable storage space, a distributed computing system comprising a number of such devices can aggregate the available storage space on each individual device and present to a user a terabyte or more of usable storage space. Similarly, a distributed computing system can present to a user a large amount of useable processing power by dividing the user's tasks into smaller segments and transmitting the segments to the individual devices for processing in parallel.
Alternatively, a distributed computing system can practice complete redundancy, in which every device within the system performs identical tasks and stores identical information. Such a system can allow users to continue to perform useful operations even if all but one of the devices should fail. Alternatively, such a system can be used to allow multiple copies of the same information to be distributed throughout a geographic region. For example, a multi-national corporation can establish a world-wide distributed computing system. Such a corporation might use a number of high performance server computing devices, rather than less powerful personal computing devices because each individual computing device would be required to service many users within that geographic region. The individual high performance devices can each perform identical tasks and store identical data, allowing users who merely seek to access the data to obtain such access from a high performance device located in a convenient location for that user.
However, distributed computing systems can be difficult to maintain due to the complexity of properly synchronizing the individual devices that comprise the system. Because time-keeping across individual process can be difficult at best, a state machine approach is often used to coordinate activity among the individual devices. A state machine can be described by a set of states, a set of commands, a set of responses, and functions that link each response/state pair. A state machine can execute a command by changing its state and producing a response. Thus, a state machine can be completely described by its current state and the actions it is about to perform, removing the need to use precise physical time-keeping.
The current state of a machine is, therefore, dependent upon its previous state, the commands performed since then, and the order in which those commands were performed. To maintain synchronization between two or more state machines, a common initial state can be established, and each state machine can, beginning with the initial state, execute identical commands in the identical order. Therefore, to synchronize one state machine to another, a determination of the order of commands performed by the other state machine needs to be made. The problem of synchronization, therefore, becomes a problem of determining the order of the commands performed, or, more specifically, determining its particular command performed for a given step. In this way, the synchronization problem reduces to one of ordering events and becomes abstracted from the notion of physical time.
One way to ensure that the commands are executed in the same sequence is to appoint a lead process (“leader”) charged with the task of assigning a command sequence number to each command. New commands are passed to the leader. The leader generates a series of commands, assigns sequence numbers, and sends the sequenced commands to the other computers. All state machines are programmed to execute commands in their assigned order. Regardless of when commands arrive at any given machine, they will be executed in the same sequence. If a later-numbered command arrives first, a state machine will simply store it until the previous commands arrive, and then execute all commands in sequence.
One problem with the state-machine approach to synchronizing data is potential informational bottleneck at the lead server. Because command sequence numbers must be assigned before the commands are executed, all new information must flow into the leader to be divided into a set of sequenced commands. All sequenced commands then flow out of the leader to the state machines of the network. There is a need for a state-machine approach to data synchronization that is capable of assigning command sequence numbers that overcomes the informational bottleneck at the leader.