Many businesses such as insurance companies, banks, brokerage firms, etc., rely heavily on data processing systems for storing and processing business critical data. Businesses seek reliable mechanisms to protect their data from natural disasters, acts of terrorism, computer hardware failure, or computer software failure.
Replication is one mechanism used by many businesses to ensure reliable access to data. Data replication is well known in the art. Essentially, replication is a process of creating at remotely located sites one or more real or near real-time copies (replicas) of a data volume. FIG. 1 shows (in block diagram form) a data processing system 10 in which data replication is employed to create and maintain three replicas of a data volume. More particularly, FIG. 1 shows a primary node P coupled to secondary nodes S1-S3 via a communication network and data links 12-18. The communication network may include a LAN or a WAN. For purposes of explanation, the communication network is the Internet, it being understood that the term “communication network” should not be limited thereto.
Primary node P consists of a primary computer system 22 coupled to a data storage system 24 via data link 26. Data storage system 24 includes a memory 28 which includes several hard disks for storing data of a primary data volume V. Secondary node S1 includes a secondary computer system 32 coupled to data storage system 34 via data link 36. Data storage system 34 includes memory 38 consisting of several hard disks for storing a first replica R1 of primary volume V. Secondary node S2 includes secondary computer system 42 coupled to data storage system 44 via data link 46. Data storage system 44 includes memory 48 consisting of several hard disks for storing a second replica R2 of the primary volume V. Lastly, secondary node S3 includes a secondary computer system 52 coupled to data storage system 54 via data link 56. Data storage system 54 includes a memory 58 consisting of several hard disks that store data of a third replica R3 of primary volume V.
FIG. 2 illustrates in block diagram form primary volume V and replicas R1-R3. Each of volumes V and R1-R3 consists of nmax data blocks. While it is said that each of these blocks contain data, it is to be understood that the data is physically stored within hard disk memory blocks allocated thereto. Thus, data of blocks 1-nmax of primary volume V are stored in distributed fashion within hard disks of memory 28. Further, data within blocks 1-nmax of replicas R1-R3 are stored in distributed fashion across hard disks in memories 38, 48, and 58, respectively, allocated thereto. Each of replicas R1-R3 is maintained as a real or near real-time copy of primary volume V. Thus, data of block n in primary volume V should be identical to data of blocks n in replicas R1-R3.
Primary computer system 22 is configured to receive requests from client computer systems (not shown) to read data from or write data to primary data volume V. In response to these requests, primary computer system 22 generates input/output (IO) transactions to read data from or write data to hard disks of memory 28. In the event of failure of primary node P, requests from client computer systems can be redirected to and serviced by one of the secondary nodes S1-S3. For example, suppose a client computer system generates a request to read data block 4 in the primary volume V after primary computer system 22 is rendered inoperable as a result of hardware failure. The read request can be redirected to secondary node S1 using mechanisms well known in the art. In response to receiving the read request, secondary computer system generates an IO transaction that accesses and reads data from a hard disk in memory 38 allocated to store block 4 data of replica R1. Data returned from memory 38 is subsequently forwarded by secondary computer system 32 to the client computer system that originally requested block 4 data. Since data of blocks n of the primary data volume V and replica R1 are or should be identical, valid data should be returned to the client computer system even though primary computer system 22 has failed.
Replicas R1-R3 can be maintained as a real-time or near real-time copy of primary volume V using one of several replication techniques including synchronous, asynchronous, and periodic replication. In each of these techniques, when a data block n of the primary data volume V is modified according to an IO transaction, the primary node P transmits a copy of the modified data block to each of the secondary nodes that store a replica. Each of the secondary nodes, in turn, overwrites its existing data block n with the copy received from the primary node P. In synchronous replication, the IO transaction that modified block n data of the primary data volume V is not considered complete until one or all of the secondary nodes acknowledge receipt of the copy of the modified data block n. In asynchronous replication, primary node P logs a copy of each data block of the primary data volume V that is modified by an IO transaction. Eventually, copies of the logged, modified data blocks are transmitted asynchronously to each of the secondary nodes S1-S3. The IO transaction that modifies data block n of the primary data volume V is considered complete when a copy of the modified block n is logged for subsequent transmission to secondary nodes S1-S3. Asynchronous replication requires ordering of dependent data modifications to ensure consistency between replicas R1-R3 and primary volume V. Synchronous replication does not require ordering. Periodic replication is yet another technique for maintaining replicas R1-R3. U.S. patent application Ser. No. 10/436,354 entitled, “Method and System of Providing Periodic Replication” (filed on May 12, 2003, incorporated herein by reference in its entirety) describes relevant aspects of periodic replication. Like synchronous and asynchronous replication, periodic replication requires that primary node P transmit a copy of a modified data block n of the primary data volume V to each of the secondary nodes S1-S3.
Modified data blocks of the primary data volume V can be transmitted from primary node P to each of the secondary nodes S1-S3 in separate transactions via the data link 12 and communication network. Each of the transactions transmitted to the secondary nodes S1-S3 may include a single modified data block or multiple modified data blocks of the primary data volume V. Either way, each of the secondary nodes S1-S3 receives, directly from primary node P, a copy of each data block n modified in primary data volume V.
The time needed for secondary nodes S1-S3 to update replicas R1-R3 with modified data blocks depends on the bandwidth of data link 12. The higher the bandwidth of data link 12, the faster transactions containing modified data blocks of the primary data volume V can be transmitted to secondary nodes S1-S3 for subsequent storage in replicas R1-R3, respectively. The cost of data link 12, however, is dependent on the bandwidth thereof. Table I below shows an example of how the cost of data link 12 can increase with bandwidth.
TABLE 1Cost of Data Link BandwidthBandwidthType of Link(Mbps)Approximate Cost (per month)T11.544$900-$1200E12.048$2000T344.736$10,000 + local loops + setup($4000 each)OC-3155.5>$40,000 + local loops + setup(>$10,000 each)OC-12622.08>$400,000 + local loops + setup(>$100,000 each)OC-482488>$2,000,000 + local loops +setup (>$200,000 each)