The present invention relates generally to network storage systems and methods, and more particularly to network storage systems that provide ultra-high data availability and geographic disaster tolerance.
In current storage networks, and in particular storage networks including geographically separated access nodes and storage resources interconnected by a network, it is desirable to provide systems and methods with what is often referred to as a “Zero Recovery Point Object (RPO)”, meaning no data loss, and “Zero Recovery Time Objective (RTO)”, meaning no loss in data availability, with minimal equipment investment.
Unfortunately current technologies are typically limited to data replication over purely synchronous distances or to replication within a single site accepting writes and only standby access to the data at sites separated by longer distances. Both of these solutions fail at achieving both Zero RPO and Zero RTO. Examples of current commercial systems providing data replication over distance include Symmetrix Remote Data Facility (SRDF) from EMC Corporation and True Copy from Hitachi Corporation.
It is also desirable that data access be localized, in part to improve access speed to blocks of data requested by host devices. Caching blocks at access nodes provides localization, however, the cached data must be kept coherent with respect to modifications at other access nodes that may be caching the same data.
Further, such complex storage applications need to withstand the failure of their backing storage systems, of local storage networks, of the network interconnecting nodes, and of the access nodes. Should a failure occur, asynchronous data transmission implies the potential for the loss of data held at the failed site. Moreover, a consistent data image, from the perspective of the application, needs to be constructed from the surviving storage contents. An application must make some assumptions about which writes, or pieces of data to be written, to the storage system have survived the storage system failure; specifically, that for all writes acknowledged by the storage system as having been completed, that the ordering of writes is maintained such that if a modification due to a write to a given block is lost, then all subsequent writes to blocks in the volume or related volumes of blocks is also lost.
Accordingly it is desirable to provide systems and methods that provide high data availability and geographic fault tolerance.