The present invention relates to computer hardware and software systems and, more particularly, to recovery or restoration of data for such a system in the event of a crash of the system or a disaster which causes the system to become inoperative for a period of time. When such a system crashes or becomes inoperative, data may be lost unless measures have been provided to recover or restore data. Specifically, the present invention provides methods and apparatus which implement substantially real-time networked disk, or data, mirroring over local area networks (LANs) and wide area networks (WANs) in a computer system, such as a SPARC Solaris 2.X environment, for disaster recovery and other applications.
Various techniques are known for recovery or restoration of data in the event of a crash of a computer system or a disaster which causes the computer system to become inoperative for an indefinite period of time or even permanently. One technique that is known is to replicate data as the data is generated by an application program being executed by the computer system. This technique is typically referred to as disk, or data, mirroring.
Heretofore, data mirroring has been achieved by one of several approaches. One approach is to provide local data mirroring utilizing redundant arrays of independent disks (RAID). Using the RAID approach, data generated by execution of an application program is written to multiple storage devices, such as conventional disk drive devices, contemporaneously with storage of the data on a local input/output (I/O) data storage device. Another approach is to provide volume management software and a redundant storage device on which data is replicated. The volume management software replicates data on the redundant storage device contemporaneously with storage of the data on the local I/O data storage device. Both of these approaches typically provide synchronous data mirroring and are characterized by miniscule delay in the replication of data for system recovery.
Considered in more detail, both RAID and volume management approaches typically provide synchronous versus asynchronous disk mirroring. In a synchronous disk mirroring architecture, such as provided by a RAID or volume management approach, disk updates are committed to each of the disk devices in the mirror before control is returned to the application program. In the event that one of the disks goes out of service, the data is still available on one of the other disk devices in the mirror.
The RAID and volume management approaches can be implemented to protect data locally. While these approaches are satisfactory for local disk mirroring for data recovery in the event of a local I/O disk failure or temporary system crash, they do not address the problem of catastrophic system failure or disaster which renders the computer system inoperative for an extended period of time or even permanently.
Another approach is to provide remote data mirroring in addition to local data mirroring. Using this approach, a remote data mirroring system is implemented both locally and remotely so that data generated locally by execution of an application program is additionally communicated over a network to a remote location for replication. Typically, remote data mirroring enables recovery of the local computer system in the event of a temporary outage or, alternatively, transfer of data processing operations to a remote computer system if the local computer system is not able to recover, the outage is for a prolonged period of time, or a disaster permanently disables the local computer system. Remote data mirroring systems have been commercialized by companies such as International Business Machines, Digital Equipment Corporation, and Data General Corporation in the past. Such remote data mirroring systems are operable in one of several modes, including a synchronous mode, asynchronous mode, and near synchronous mode.
Unfortunately, implementing synchronous data mirroring over a network raises serious performance problems. Rather than working with local data channels that can accept data at 5, 20, or 40 megabytes (MB) per second or higher, the data must travel over a much lower bandwidth channel, stretching out data transfer times. Network latencies pile up on top of the much lower bandwidth, further slowing I/O turnaround times. Any practical experience with an I/O rich application program that has compared network file system (NFS) update performance over local disk performance readily illustrates this point. If networked disk mirroring is implemented using synchronous I/O techniques, application performance is tremendously degraded.
On the other hand, implementing asynchronous disk mirroring over a network raises data integrity problems. In the event of a disaster, the data on the remote, or secondary, computer system may be up to several seconds older than what would be found on the local, or primary, computer system.
The near synchronous mode is a forced compromise between the synchronous and asynchronous modes. Near synchronous data mirroring provides asynchronous remote data mirroring at a preset interval, but requires the local computer system to periodically halt execution of the application program at the preset interval until data replication by the remote computer system is acknowledged.
Therefore, a remote data mirroring system which comprises an architecture configured for optimal data mirroring is needed. Furthermore, such a system is needed which addresses the problem of the limited bandwidth of a network for communication for data over the network.