The present invention relates to clustered systems, and more specifically, this invention relates to recovering a clustered system that has failed.
Clustered systems (e.g., utilizing one or more computer clusters, etc.) are a popular implementation that addresses modern computing needs. In order to be fault tolerant, nodes must use the same state as all other nodes within the clustered system, and the nodes may be updated once the state changes. However, current implementations of clustered systems lack robust fault tolerance, and recovery of a failed clustered system may not be possible if the number of failures exceeds the number that the system was designed to handle.