The following description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art.
Ensuring that clients of storage systems always have access to their data is increasingly important when providing robust, real-time access to big data. Limited data access of hours, minutes, or even seconds can cost a company thousands of dollars in lost productivity or profit. As such, many different systems with fault tolerant infrastructures have been developed to provide as little down-time as possible.
All publications identified herein are incorporated by reference to the same extent as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference. Where a definition or use of a term in an incorporated reference is inconsistent or contrary to the definition of that term provided herein, the definition of that term provided herein applies and the definition of that term in the reference does not apply.
U.S. Pat. No. 8,051,361 to Sim-Tang et al. teaches a lock free clustered erasure coding solution whose processors negotiate with one another to decide the data sets for which each of them is responsible. So long as each data set is managed by only one erasure encoding processor, there is no need for locking the data, allowing the data to always be accessible at any time. While performing consistency checks, Sim-Tang's recovery process fixes inconsistent data sets in a lock-less manner before starting regular cluster activities. Sim-Tang's system, however, fails to persist data across a plurality of systems when a full system fails.
U.S. Pat. No. 8,112,423 to Bernhard discloses a system that replicates data from primary clusters to a replicated cluster. When a primary cluster fails, clients of the primary cluster are directed to the replicated cluster for service, and the replicated cluster is then used to restore data to the primary cluster. While Bernhard's system provides data persistence, data recovery is slowed by Bernhard's infrastructure of providing a single replicated cluster for each primary cluster.
Thus, there is still a need for a persistent system that allows for high availability and fast recovery of data.