This invention relates to information storage systems.
Information is the most crucial asset of many businesses, and any disruption to the access of this information may cause extensive damage. Some businesses, such as banks, airlines (with e-tickets), auction sites, and on-line merchants, may actually stop functioning without access to their information. No matter how reliable a data center is, there can still be site failuresxe2x80x94floods, earthquakes, fires, etc. that can destroy the data stored on a storage device and any co-located backup media. Geographic replication is the only way to avoid service disruptions.
Geographic replication has challenges: performance needs to be maintained; different sites might run at different speeds, and have different latencies. Having multiple remote copies may increase reliability, but for most purposes the replicas need to be kept in sync, in real time. If a site fails and comes back on-line, its data need to be recovered without an excessive impact on the rest of the system.
There are three prior art approaches for providing highly reliable data storage by replication. A first approach is host replication, in which the host computer runs software that replicates the data between local and remote server. A host computer thus, before writing on its local storage device, sends the data to another remote host. The SUN Microsystems SNDR (see, e.g., Sun StoreEdge Network Data Replicator Software, available on the SUN website) and Veritas Software""s Volume Replicator (see, e.g., Veritas volume replicator: Successful replication and disaster recovery, available on the Veritas website) are examples of this approach. Disadvantageously, specific software must be loaded on each host computer and must be compatible with each host computer""s operating system. Further, specific file systems may need to be used for compatibility.
A second approach is storage device replication, in which the storage device replicates the data between a local and remote system. Rather than having a host send the data to another host for replication purposes, the storage device itself communicates with a remote storage device and insures that the data is properly replicated. In this approach, the host treats the remote storage device like it was a direct storage device. Thus, since the host is required to run no special software, it can run any operating system and can use any file system. Disadvantageously, the storage devices themselves that are used in this approach are complex and expensive since they need to perform functions beyond storing data. Further, all the storage devices need be similar since they must communicate with each other. The EMC Corporation""s Symmetrix Remote Data Facility (SRDF) product (see, e.g., Symmetrix Remote Data Facility product data sheet, available on the EMC website) uses this second approach. The SRDF provides only two choices for reliability: xe2x80x9csafexe2x80x9d access but not fast, in which the host only gets a response after the remote site has acknowledged the write, or xe2x80x9cfastxe2x80x9d access but not safe, in which the host can receive a positive acknowledgment for an update that eventually gets lost. Furthermore, while the data are safe, a site failure of the primary data site cannot be seamlessly hidden from the host.
A third approach to data replication is appliance replication, in which a storage appliance is connected to the host computer on one side and the multiple storage devices on the other side by means of a storage area network (SAN). The storage devices can be local and/or remote to the appliance or any combination. The appliance performs the replication and unburdens both the host and the storage devices. The appliance approach has the benefit of host and storage platform independence, which eases deployment of features and mixing different storage devices and hosts. Disadvantageously, a bottleneck can be created at the appliance when it needs to recover one copy of the data by moving it through the appliance between storage devices while simultaneously handling the replication of data from the host to the plural storage devices. The former will need to be performed whenever failure recovery takes place or whenever another copy of the data in one storage device is created in a newly connected storage device. Further, the above-noted SAN preferably connects the storage devices rather than a general-purpose network. An example of this third approach is Hewlett-Packard""s SANlink product (see, e.g., HP SANlink Overview and Features, available on the HP website). There are several variants to the third approach. Falconstor""s IPStor servers, for example, perform appliance replication, but they use an IP network to communicate with the host instead of a SAN. Falconstor requires the host to run a special device driver, which allows the host to communicate with the Falconstor server as if it was a disk via IP network (see, e.g., Falconstor""s IP Store Product Brochures available on the Falconstor website).
The present invention eliminates the problems associated with the prior art approaches. In accordance with the present invention, data replication functionalities between a host computer, an interconnecting network, and a plurality of storage devices are separated into host elements and a plurality of storage elements. One or more host elements are associated with the host computer and a storage element is associated with and connected to each of the plurality of storage devices. The network can be any computer network such as the Internet or an intranet. The host element is connected to the host computer and behaves, from the standpoint of the host computer, like a direct-attached disk. The host element is responsible for replicating data between the storage devices, and for maintaining data consistency. Further, the host element functions to instruct a storage element that does not contain up-to-date data in its associated storage device to recover that data from an indicated other one of the plurality of storage elements and its associated storage device. The storage elements and their associated storage devices may be located in any combination of diverse or same geographic sites in a manner to ensure sufficient replication in the event of a site or equipment failure. The storage elements are responsible for executing requests received from the host element and for maintaining data consistency. When a storage element and its associated storage device is determined not to contain up-to-date data, recovery is effected by data transfer from one of the other storage elements and its associated storage device, the identity of that other storage element being indicated by and received from the host element. Such recovery is done directly between the indicated other storage element and its associated storage device and the non up-to-date storage element and its associated storage device. Advantageously, by separating the work functions between the geographically separated host elements and storage elements, recovery between storage elements can be effected directly between these storage elements and their associated storage devices without moving data through the host element. Thus, data intensive manipulations of the data stored in the storage devices can be performed without host element involvement. Typical operations such as maintaining snapshots can be performed efficiently since multiple data exchanges need only take place between the storage device and the storage element and not over the network as in some prior art approaches. A further advantage of the present invention is that, since the host element is accessed like a local disk, the host computer does not require any hardware or software modifications. Thus, there are no dependencies upon any particular operating system or application running on the host.
In determining that a storage element and its associated storage device does not contain up-to-date data, the host element assigns consecutive sequence numbers to consecutive write requests from the host computer, which sequence number is sent along with the request from the host element to the storage elements. The storage element and its associated storage device is determined not to have up-to-date data when it fails to receive one or more recent write requests, or when a gap is detected in the sequence number of received write requests.