The necessity and importance of computer data backup is well recognized in recent years as the continuous operation of most, if not all, organizations highly depends on the reliability and availability of digital data. Conventional local backup (i.e., backing up the data locally) is considered not adequate as it cannot guard against disasters (such as earthquake and flood) and theft, and the concept of remote backup, which preserves a backup copy of the data at a remote domain site, is proposed.
To conduct remote backup usually involves the duplication of large amount of data to a remote storage device over a network. This task requires significant network bandwidth, which is not practical in early days. However, as new last-mile technologies such as fiber-to-the-building (FTTB) and fiber-to-the-home (FTTH) are developed and deployed, the availability of bandwidth is no longer an issue and remote data backup has become an enthusiastic topic in the computer industry.
Remote data backup can be conducted in various ways. One popular approach is the so-called remote mirroring in which data output by an application is written to a local disk storage and a remote disk storage simultaneously.
FIG. 1 is a schematic diagram showing conventional disk volume management. As illustrated, a conventional disk storage contains one or more physical volumes such as /dev/hda and /dev/hdb, jointly combined into a volume group such as /dev/vg. Then, by a local mapping table, the volume group is partitioned into a number of logical volumes such as /dev/vg/lv1, /dev/vg/lv2, and /dev/vg/lv3. The minimum storage unit of a physical volume is called a physical extent (PE) ranging usually from 4 MB to 256 MB. Similarly, the minimum storage unit of a logical volume is called a logical extent (LE).
By mapping a remote physical volume as a local logical volume, a local mirroring mechanism can be employed to achieve remote mirroring. However, local mirroring is usually conducted over a high-speed, low-delay network such as a local area network (LAN) or a storage area network (SAN). In remote mirroring, in contrast, the extension of the mirroring mechanism over a wide area network (WAN), which involves significant delay from the geography, dispersion inevitably degrades the storage access and recovery performance, even though the remote backup copy of the data is safer from large-scale catastrophes.
U.S. Patent Publication No. 2004/0236983 discloses a data storage system capable of maintaining data consistency and cache coherency between remote mirror pair when the network communications therebetween is disrupted. As shown in FIG. 2, the data storage system 200 contains a first storage subsystem 212 and a second storage subsystem 214 at a remote domain site. The first and second storage subsystems 212 and 214 are coupled to one or more hosts 206 through controllers 222 and 224, respectively. The first and second storage subsystems 212 and 214 are also coupled to each other via a link 107. The controllers 222 and 224 allow the storage subsystems 212 and 214 to increase capacity and reliability by integrating large number of smaller storage modules. The link 107, on the other hand, allows an updated copy of the first storage subsystem 212's data to be backed up to the second storage subsystem 214.
U.S. Pat. No. 6,237,008 provides a pair-pair remote copy (PPRC) technique. As shown in FIG. 3, a local virtual storage volume B is “paired” with (i.e., mirrored to) a virtual storage volume C at a remote domain site. As shown in FIG. 3, a local domain site processor 310 issues a snapshot command 332 to store a “snapshot” copy of a directory of a virtual storage volume A into a directory of the virtual storage volume B, which is automatically mirrored to the paired virtual storage volume C by the remote mirroring processes. The technique can reduce the processing overhead in replicating data to remote storage volume. However, to recover from the remote mirrored data over a WAN, the recovery time would be too long due to the significant delay over the WAN.
U.S. Pat. No. 5,615,329 discloses a remote synchronous replication technique shown in FIG. 4A and a remote asynchronous replication technique shown in FIG. 4B. Under synchronous replication, there would be a higher level of consistency between the local and remote domain sites. However, the performance is significantly impaired if the replication is conducted over a WAN. In contrast, asynchronous replication achieves better performance at the cost of data consistency. The U.S. Pat. No. 5,615,329 patent also doesn't suggest how the recovery time can be further improved.
Another conventional technique is the CoStore architecture proposed by the University of Michigan, which achieves remote mirroring in a one-cluster-site-to-another-cluster-site manner. The technique is able to achieve better performance in remote mirroring by employing multiple local domain sites with concurrent transmission processes. However, the CoStore architecture is still based on file-level services and, when new storage space is added or when existing storage space is removed or adjusted, e.g., changing the column space of a redundant array of independent disks (RAID), in a cluster, tremendous effort of the system operator is required.
Additionally, Yotta proposed a NetStorage system in a paper “Data Localization and High Performance of Clustered File Systems Spanning Long Distances” published in 2003. The system relies on the aid of cache memory to achieve efficient clustered file system. However, to establish such a system, a dedicated storage network is required and the cost of ownership is too high for small and medium businesses.
Therefore, the major motivation behind the present invention is to strike a balance between increasing the performance of remote mirroring over a WAN and achieving storage resource sharing and transmission optimization.