Collaborative projects, which are often facilitated in a concurrent manner between globally separated resources (i.e., multi-site collaborative projects), have become commonplace for any number of different types of projects. Examples of such projects include, but are not limited to, developing software, designing jetliners and designing automobiles. Relying upon distributed resources (e.g., resources at physically different locations, logically different locations, etc.) to accelerate project time lines through optimization of human resource utilization and leveraging of global resource skill sets has proven itself to offer advantageous results.
A distributed computing solution used in facilitating a multi-site collaborative project is referred to herein as a distributed multi-site collaborative computing solution. However, a distributed multi-site collaborative computing solution is only one example of a distributed computing solution. In one example, a distributed computing solution comprises a network of computers operating an automobile. In another example, a distributed computing solution comprises a network of computers in one geographic location (a data center). In still another example, a distributed computing solution is a plurality of computers connected to one router (i.e., a subnet).
While conventional distributed computing solutions do exist, they are not without limitations that adversely impact their effectiveness, reliability, availability, scalability, transparency and/or security. In particular, with respect to conventional distributed multi-site collaborative computing solutions are limited in their ability to synchronize work from globally distributed development sites in a real-time, fault-tolerant manner. This inability forces changes in software development and delivery procedures that often cause delays and increase risk. Accordingly, cost savings and productivity improvements that should be realized from implementing a collaborative project utilizing a conventional distributed computing solution are not fully achieved.
Conventional distributed multi-site collaborative computing solutions undesirably force users to change their development procedures. For example, conventional distributed multi-site collaborative computing solutions that lack advantageous functionalities associated with real-time information management capabilities have a fundamental problem in that they cannot guarantee that local and remote Concurrent Versions Systems (CVS) repositories will be in sync at any point in time. This means that there is a great likelihood that developers at different sites can inadvertently overwrite or corrupt each other's work. To prevent such potential for overwriting and corruption, these conventional distributed multi-site collaborative computing solutions require excessive and/or error prone source code branching and manual file merging to become part of the development process. This effectively forces development work to be partitioned based on time zones and makes collaboration between distributed development teams extremely challenging, if not impossible.
A replicated state machine is a preferred enabler of distributed computing solutions. One of several possible examples of a distributed computing solution is a replicated information repository. Therefore, more particularly, a replicated state machine is a preferred enabler of replicated information repositories. One of several possible applications of replicated information repositories is distributed multi-site collaborative computing solutions. Therefore, more particularly, a replicated state machine is a preferred enabler of distributed multi-site collaborative computing solutions.
Accordingly, distributed computing solutions often rely upon replicated state machines, replicated information repositories or both. Replicated state machines and/or replicated information repositories provide for concurrent generation, manipulation and management of information and, thus, are important aspects of most distributed computing solutions. However, known approaches for facilitating replication of state machines and facilitating replication of information repositories are not without their shortcomings.
Conventional implementations of facilitating replication of state machines have one or more shortcomings that limit their effectiveness. One such shortcoming is being prone to repeated pre-emption of proposers in an agreement protocol, which adversely impacts scalability. Another such shortcoming is that the implementation of weak leader optimization requires the election of a leader, which contributes to such optimization adversely impacting complexity, speed and scalability, and requires one more message per agreement (e.g., 4 instead of 3), which adversely impacts speed and scalability. Another such shortcoming is that agreements have to be reached sequentially, which adversely impacts speed and scalability. Another such shortcoming is that reclamation of persistent storage is limited, if not absent altogether, which imposes a considerable burden on deployment because storage needs of such a deployment will grow continuously and, potentially, without bound. Another such shortcoming is that efficient handling of large proposals and of large numbers of small proposals is limited, if not absent altogether, which adversely affects scalability. Another such shortcoming is that a relatively high number of messages must be communicated for facilitating state machine replication, which adversely affects scalability and wide area network compatibility. Another limitation is that delays in communicating messages adversely impact scalability. Another such shortcoming is that addressing failure scenarios by dynamically changing (e.g., including and excluding as necessary) participants in the replicated state machine adversely impacts complexity and scalability.
Conventional implementations of facilitating replication of information repositories have one or more shortcomings that limit their effectiveness. One such shortcoming is that certain conventional multi-site collaborative computing solutions require a single central coordinator for facilitating replication of centrally coordinated information repositories. Undesirably, the central coordinator adversely affects scalability because all updates to the information repository must be routed through the single central coordinator. Furthermore, such an implementation is not highly available because failure of the single central coordinator will cause the implementation to cease to be able to update any replica of the information repository. Another such shortcoming is that, in an information repository replication implementation relying upon log replays, information repository replication is facilitated in an active-passive manner. Therefore, only one of the replicas can be updated at any given time. Because of this, resource utilization is poor because other replicas are either idle or limited to serving a read-only application such as, for example, a data-mining application. Another such shortcoming results when implementation relies upon weakly consistent replication backed by conflict-resolution heuristics and/or application-intervention mechanisms. This type of information repository replication allows conflicting updates to the replicas of the information repository and requires an application using the information repository to resolve these conflicts. Thus, such an implementation adversely affects transparency with respect to the application.
Still referring to conventional implementations of facilitating replication of information repositories have one or more shortcomings that limit their effectiveness, implementations relying upon a disk mirroring solution are known to have one or more shortcomings. This type of implementation is an active-passive implementation. Therefore, one such shortcoming is that only one of the replicas can be used by the application at any given time. Because of this, resource utilization is poor because the other replicas (i.e., the passive mirrors) are neither readable nor writable while in their role as passive mirrors. Another such shortcoming of this particular implementation is that the replication method is not aware of the application's transaction boundaries. Because of this, at the point of a failure, the mirror may have a partial outcome of a transaction, and may therefore be unusable. Another such shortcoming is that replication method propagates changes to the information from the node at which the change originated to all other nodes. Because the size of the changes to the information is often much larger than the size of the command that caused the change, such an implementation may require an undesirably large amount of bandwidth. Another such shortcoming is that, if the information in the master repository were to become corrupted for any reason, that corruption would be propagated to all other replicas of the repository. Because of this, the information repository may not be recoverable or may have to be recovered from an older backup copy, thus entailing further loss of information.
Therefore, a replicated state machine that overcomes drawbacks associated with conventional replicated state machines would be useful and advantageous. More specifically, a replicated information repository built using such a replicated state machine would be superior to a conventional replicated information repository. Even more specifically, a replicated CVS repository built using such a replicated state machine would be superior to a conventional replicated CVS repository.
Forming a membership, or a specific collection of entities from a set of known entities, is useful in distributed computing systems such as described above, so that information may be shared between specified groupings of trusted nodes.