As personal computing devices become more powerful, containing increased storage space and processing capabilities, the average user consumes an increasingly smaller percentage of those resources in performing everyday tasks. Thus, many of today's personal computing devices are often not used to their full potential because their computing abilities greatly exceed the demands most users place upon them. An increasingly popular method of deriving use and value from the unused resources of powerful modern personal computing devices is a distributed computing system, in which the computing devices act in coordination with one another to provide more reliable access to data and computational resources. An advantage of distributed systems is the ability to continue to operate in the face of physical difficulties that would cripple a single, larger computing device. Such difficulties could include sustained power outages, inclement weather, flooding, and terrorist activity, for example.
To compensate for the increased risk that individual member computing devices may become disconnected from the network, turned off, suffer a system malfunction, or otherwise become unusable, redundancy can be used to allow the distributed computing system to remain operational. Thus, the information stored on any one personal computing device can be redundantly stored on at least one additional personal computing device, allowing the information to remain accessible, even if one of the personal computing devices fails.
A distributed computing system can practice complete redundancy, in which every device within the system performs identical tasks and stores identical information. Such a system can allow users to continue to perform useful operations even if all but one of the devices should fail. Alternatively, such a system can be used to allow multiple copies of the same information to be distributed throughout a geographic region. For example, a multi-national corporation can establish a world-wide distributed computing system.
However, distributed computing systems can be difficult to maintain due to the complexity of properly synchronizing the individual devices that comprise the system. Because time-keeping across individual processes can be difficult at best, a state machine approach can be used to coordinate activity among the individual devices. A state machine can execute a command by changing its state and producing a response. Thus, a state machine can be completely described by its current state and the action it is about to perform, removing the need to use precise time-keeping.
The current state of a state machine is, therefore, dependent upon its previous state, the commands performed since then, and the order in which those commands were performed. To maintain synchronization between two or more state machines, a common initial state can be established, and each state machine can, beginning with the initial state, execute identical commands in identical order. Therefore, to synchronize one state machine to another, a determination of the commands performed by the other state machine needs to be made. The problem of synchronization, therefore, becomes a problem of determining the order of the commands performed, or, more specifically, determining the particular command performed for a given step.
One mechanism for determining which command is to be performed for a given step is known as the Paxos algorithm. In the Paxos algorithm, any of the individual devices can act as a leader and propose a given client command for execution by every device in the system. Every such proposal can be sent with a proposal number to more easily track the proposals. Such proposal numbers need not bear any relation to the particular step for which the devices are attempting to agree upon a command to perform. Initially, the leader can suggest a proposal number for a proposal the leader intends to submit. Each of the remaining devices can then respond to the leader's suggestion of a proposal number with an indication of the last proposal they voted for, or an indication that they have not voted for any proposals. If, through the various responses, the leader does not learn of any other proposals that were voted for by the devices, the leader can propose that a given client command be executed by the devices, using the proposal number suggested in the earlier message. Each device can, at that stage, determine whether to vote for the action or reject it. A device should only reject an action if it has responded to another leader's suggestion of a higher proposal number. If a sufficient number of devices, known as a quorum, vote for the proposal, the proposed action is said to have been agreed upon, and each device performs the action and can transmit the results. In such a manner, each of the devices can perform actions in the same order, maintaining the same state among all of the devices.
However, if two or more actions or requests need not be ordered with respect to one another, then the Paxos algorithm can be made more efficient by allowing a more generalized agreement among the constituent devices. Often two requests that are transmitted at approximately the same time commute with one another. More specifically, the response to one request sent at approximately the same time as another request is not affected by the other request. For example, in a banking system customer A can issue a request to deposit $100 into her account at approximately the same time that customer B issues a request to withdraw $50 from his account. These two exemplary commands commute because customer B's request to withdraw $50 from his account does not change customer A's balance irrespective of whether customer B's request is performed before or after customer A's request. Consequently, a device that executes customer B's request first will provide the same results to both customer A and customer B as a device that executes customer A's request first, and both devices will agree about the resulting system state, so that future commands also generate consistent responses.
The generalized Paxos algorithm recognizes that devices selecting commuting commands in any order remain synchronized. For example, the generalized Paxos algorithm can recognize that a device selecting customer A's request prior to customer B's request is in agreement with a device selecting customer B's request prior to customer A's request. Consequently, a generalized Paxos algorithm can seek to achieve agreement on a series of functions, executed as a series of steps, while the above-mentioned Paxos algorithm required agreement on a step-by-step basis.
However, while the generalized Paxos algorithm is more efficient than the Paxos algorithm because it achieves agreement on series of steps, rather than on a step-by-step basis, the generalized Paxos algorithm still must be able to recognize which functions or steps commute before it can achieve agreement on a particular series. Absent the present invention, programmers implementing the generalized Paxos algorithm would have to explicitly declare beforehand the commutativity for each function or step. This greatly complicates the programming of distributed systems, and can create errors that may be difficult to detect during testing. In addition, the programmers may not fully recognize all of the functions that commute, thus reducing the effectiveness of the generalized Paxos algorithm. Furthermore, it is possible that functions commute only when the state they are applied to possesses a certain property, and the programmer, not being able to rely on this property always holding, must conservatively dictate that the functions do not commute.
Therefore what are needed are systems and methods for automatically detecting commutativity of functions for use in the generalized Paxos algorithm.