The present invention relates to redundant data storage, and in particular, to a method for redundantly storing data in a geographically-diverse data-storing system that provides cross-site redundancy, utilizes erasure codes, and isolates update and recovery processes of data blocks from those of others, to ensure recoverability of data.
Redundant data storage is utilized in RAID systems to provide data protection in storage devices and in storage-area networks. There are many different schemes for allocating and storing redundant data, corresponding to different levels of RAID. For example, for RAID-5, parity information is distributed among different parity blocks in each of the independent data disks. The data and parity information is arranged on the disk array so that the two types of data are always on different disks. This scheme provides fault tolerance, and is generally the most popular form of RAID used today. Other RAID systems use different “erasure codes” (i.e., error-correcting codes where the position of the error is known) to implement redundancy schemes.
Another type of system that may utilize redundant data storage is a geographically-diverse network, such as a geoplex. A geoplex is a collection of geographically-distributed sites consisting of servers, applications, and data. The geoplex sites cooperate to improve reliability and/or availability of applications and data through the use of redundancy. Data redundancy in geoplexes typically takes the form of mirroring, where one or more full copies of the logical data are maintained at remote sites.
Mirroring has a number of desirable properties. It is conceptually simple, and it does not compromise overall system performance when operating in an asynchronous mode for remote updates. Also, the recovery procedure for mirroring is simple, and can utilize all sites to process some of the work (i.e., an active—active configuration), or can implement fast failover from the primary site to a secondary site (i.e., an active-passive configuration).
However, mirroring also has many drawbacks. In particular, mirroring is expensive. Because the amount of storage required for the logical data must be doubled or more, depending on the number of mirror copies, the total cost of mirroring can be substantial. In addition, for very high reliability, more than one mirror copy generally is required. While the high cost for remote mirroring may be acceptable to some entities with mission-critical applications, such as online transaction processing systems, a geoplex would not qualify as a low-cost product available for many other applications with large data sets, such as data mining and scientific computing. Additionally, mirroring does not provide much flexibility for system design and operation.
Despite these well-known and inherent drawbacks of mirroring, alternative methods have not generally been implemented in geoplexes. Thus, it would be desirable to provide a geographically-diverse data-storing system that utilizes erasure codes to reduce expense and provide greater flexibility, without sacrificing data-recovery capability.