1. Field of the Invention
The invention generally relates to the mirroring of data between two separate locations. More specifically, the invention relates to a method and apparatus for asynchronously mirroring data between two separate locations.
2. Description of the Related Art
Mirroring data to a geographically separate location has always been the most reliable option for ensuring data availability and has also provided a method for archiving data. Mirroring the data from a local disk drive to another local disk drive on the same computer protects the data from any one local disk failure. Most conventional mirroring methods and systems mirror data within a local system. Local data replication, however, will not protect the data from a system failure or site catastrophe. Moreover, few conventional mirroring methods and systems offer data replication between systems that are not physically co-located.
Some conventional prior art mirroring methods and systems allow the mirroring of data between two geographically separate locations. Such systems attempt to provide the mirroring of data in a synchronous fashion. These methods require the use of very high-speed dedicated communication mediums between the two sites in order to achieve data coherency. This method results in systems requiring the deployment of specialized hardware and customized communication infrastructure, which make the cost implementation extremely expensive. As such, its use is limited to a few areas where these systems are closely managed to only mirror data that is the most critical. In these prior art methods and systems, the intelligence to perform the mirroring function is built into the hardware of the system and is cognizant of the communication medium it is using to mirror the data. When a write to disk is transacted, the data is placed onto the local disk, then processed for transmission and transmitted over the dedicated communication medium. When the data is received at the remote system, it is processed again, then written to the disk of the remote system. Thus, a write to the disk from an application includes a write to the local disk and to the remote disk before the write operation is complete and another mirror write can be performed. This can cause significant delay in performing large numbers of consecutive write operations because it is necessary for mirroring to complete before the write operation is complete.
FIG. 1 illustrates conventional prior art systems for mirroring the data between a local storage system 101 and a remote storage system 165 over a long distance. As shown, the primary local storage system includes an I/O controller 110 and a storage driver 120 for managing I/O to and from a local source device 125. In FIG. 1, the source device 125 is shown as being external to the local storage system 101 and may include any type of storage device such as a hard disk drive, a CD-Rom drive, a flash memory (such as an EEPROM) or the equivalent. It is understood that the source device 125 need not be external to the local storage system 101 but may, instead, be implemented directly within the local storage system 101—as in the case of a hard disk drive. The local storage system also preferably includes a target device 150, so that the local storage system can also operate as a remote storage system for another local storage system (not shown).
The local storage system 101 is coupled to a remote storage system 165 via a secure dedicated high-speed, hi-bandwidth line 103. Both the local storage system 101 and the remote storage system 165 are of the same type and operate using the same I/O instruction set. As shown, the remote storage system includes an I/O controller 170 and a storage driver 180 for managing I/O to and from a target device 185. In FIG. 1, the target device 185 is shown as being external to the remote storage system 165 and may be any type of storage device such as a hard disk drive, a CD-Rom drive, a flash memory (such as an EEPROM) or the equivalent. The remote storage system 165 also preferably includes a source device 190, such that the remote storage system 165 can also operate as a local storage system and have the data from its source device 190 mirrored on another remote storage system (not shown).
In the systems illustrated in FIG. 1, the components are highly specialized and are designed to perform a single function, namely, to mirror data synchronously between the two storage systems. The mechanism employed to achieve this one function is often proprietary and is dependent on the underlying hardware and prior knowledge of the medium of communication over which the systems are connected. There are several disadvantages to this type of synchronous mirroring.
Synchronous mirroring relies on one or more proprietary storage-oriented protocols. The protocol is designed specifically for the dedicated communication medium deployed. This lack of resiliency makes it difficult to take advantage of advances in long distance communications without redesigning the entire system.
Communication link failure or link congestion can result in a loss of mirror coherency. Depending on the kind of communication failure that occurs, the effects can vary from degradation of application performance to application failure. The former is a result of attempting to synchronize the mirrors concurrently with continued application I/O activity and the latter is a result of policies that cannot tolerate data mirroring incoherencies.
What is needed is a system and method for mirroring data that does not require the use of a proprietary protocol. Instead the system and method should adopt widely accepted and commonly used protocols such as TCP/IP. In addition, the components that form the storage systems for remote mirroring should also be based on commonly available hardware.
What is further needed is a system using the networking infrastructure of the day for mirroring data. This allows the underlying network infrastructure to be switched out or replaced without having to completely change the remote mirroring storage system in order to accommodate the underlying network change. Communication mediums from existing telephone lines to high-end state of the art communication mediums must be accommodated. The choice of mediums must be selectable based on the amount of data to be replicated, the degree and level of availability desired, and the economics of deploying remote mirroring. When deploying such systems, the most expensive component is always the recurring cost of the communication medium. This is always the most significant factor in the cost of data, cost of data availability, cost of system and the recurring cost of the medium equation. The Internet is a good example of a system that provides a function that is used universally and is available at all points of the cost curve based on the performance required. A similar approach is required for remote data mirroring.