Many businesses such as insurance companies, banks, brokerage firms, etc., rely heavily on data processing systems for storing and processing business data. Often the viability of a business depends on reliable access to valid data contained within its data processing system. As such, businesses seek reliable ways to consistently protect their data processing systems and the data contained therein from natural disasters, acts of terrorism, or computer hardware and/or software failures.
Businesses must have plans in place to recover quickly after, for example, a data volume is corrupted by hardware failure, software failure, or site failure. These plans often involve creating a tape backup copy of the data volume that is sent offsite. A corrupted data volume can be restored from the tape backup or offsite tapes can used to reconstruct the data volume at a second location so that business can resume at the second location. Even with the most well-executed backup strategy, restoring data from tape is typically a time consuming process that results in data that is too old. For many businesses, the delay in restoring a data volume from a tape backup copy and the age of data is unacceptable. The need to recover quickly to a recent state from often requires businesses to employ real-time replication of data. Real-time replication enables rapid recovery when compared to conventional bulk data transfer from sequential media, and is much more current than periodic backups sent offsite.
FIG. 1 illustrates an exemplary data processing system 10 that employs data replication as a means for protecting critical data and enabling rapid recovery. Data processing system 10 includes a host node 12 coupled to a backup node 14 via a data communication link 16. Additionally, the data processing system 10 shown in FIG. 1 includes primary and backup data storage systems 20 and 22, respectively, coupled to host and backup nodes 12 and 14, respectively. Primary data storage system 20 stores the contents of a primary data volume, while backup data storage system 22 stores a copy or replica of the primary data volume. Primary and backup data storage systems 20 and 22, respectively, may include several memory devices such as magnetic or optical disc arrays. Data of the primary volume is stored in a distributed fashion across the several memory devices of primary data storage system 20. Similarly, data of the replica is stored in a distributed fashion across the several memory devices of backup data storage system 22.
Host node 12 may take the form in a server computer system that receives requests from client computer systems (not shown). More particularly, the requests are received and processed by an application executing on host node 12. In response to processing the requests, host node 12 may generate read or write-data transactions that read data from or write data to the primary data volume of storage system 20. Primary data storage system 20 may return data to the host node 12 for subsequent application processing in response to primary data storage system 20 receiving a read data transaction. Or primary data storage system 20 returns an acknowledgement to the host node 12 that data has been successfully stored in primary data storage system 20 in response to receiving a write-data transaction.
Backup data storage system 22, as noted below, stores a replica of the primary data volume stored in primary data storage system 20. Backup node 14 may also be a server computer system that executes the same application executing on host node 12. The application running on backup node 14 can receive and process requests from client computer systems (not shown). Backup node 14 and backup data storage systems, as their names imply, provide a backup source to service client computer system requests should host node 12 and/or primary data storage system 20 become unusable or inaccessible due to some event such as hardware failure, software failure, or site failure. When host node 12 and/or primary data storage system 20 suddenly become unusable or inaccessible, client computer system requests can be diverted to backup node 14 for subsequent processing, after relatively quick application or file system revovery.
First node 12 includes a data storage management system 24 that takes form in instructions executing on one or more processors within first node 12. Data management system 24 may include a file system and a system for managing the distribution of the primary volume data across the several memory devices of primary data storage system 20. Volume Manager™ provided by VERITAS Software Corporation of Mountain View, Calif., is an exemplary system for managing the distribution of volume data across memory devices. Data storage management systems 24 generates read and write-data transactions described above in response to host node 12 receiving and processing requests from client computer systems.
Host node 12 also includes a data volume replicator 30 that takes form in instructions executing on one or more processors of host node 12. Replicator 30 functions to create a real-time replica of the primary data volume in primary data storage system 20. The replica is created and maintained in backup data storage system 22.
Replicator 30 receives a copy of each write-data transaction generated by data storage management system 24. Replicator 30 can store the received write-data transactions in write transaction log 32. Write transaction log 32 may or may not be a component of the primary data storage system 20. For purposes of explanation, write transaction log 32 will be considered a component within primary data storage system 20.
Replicator 30 processes and eventually transmits logged write-data transactions to backup node 14 via communication link 16. Communication link 16 can be any kind of messaging network, such as an IP link. For purposes of explanation, communication link 16 will take form in a wide-area network (WAN) link. Backup node 14 receives the write transactions, and the replicated volume in backup data storage system 22 is subsequently updated (i.e., data is written to the replicated data volume) in response to backup node 14 receiving the write-data transactions. In this fashion, a near real-time replica of the primary data volume within primary data storage system 20 is maintained within backup data storage system 22.
Replicator 30, in addition to performing other functions, can order dependent write-data transactions before dependent write-data transactions are transmitted to backup node 14. First and second write-data transactions are dependent if the first write-data transaction must be completed at data storage system 22 before the second write-data transaction can begin at data storage system 22. As an aside, the first write-data transaction must be completed at data storage system 20 before the second write-data transaction can begin at data storage management system 20. If the second write-data transaction completes before the first write-data transaction, the primary data volume of data storage system 20 and/or the primary data volume replica of data storage system 22 may become corrupted or otherwise internally inconsistent.
Replicator 30 can order dependent write-data transactions in a variety of ways. In one embodiment, replicator 30 orders dependent write-data transactions by assigning sequence numbers thereto. For example, a write-data transaction dependent on an earlier generated write-data transaction is assigned a sequence number that is greater then the sequence number assigned to the earlier generated write-data transaction. The sequence numbers define the order in which write-data transactions are to be completed; those write-data transactions assigned lower sequence numbers are to be completed before write-data transactions that are assigned higher sequence numbers. The sequence numbers are transmitted to data storage system 22 with their assigned sequence numbers to insure that the primary data volume replica in backup data storage system 22 is updated in proper order during asynchronous replication as will be more fully described below. Again, if the primary data volume replica is not updated in order, the replica may store data that is inconsistent with the primary data volume in primary data storage system 20.
Synchronous replication achieves full data currentness in the replicated volume, but performance is affected by delay in transmission of dependent write-data transactions noted above. In other words, throughput of write-data transactions between replicator 30 and backup node 14 is slowed due to the transmission delay between dependent write-data transactions. If throughput of write-data transactions is slowed too much, replicator may not be able to maintain a real-time replica of the primary data volume without rendering the data volume unusable.
One solution to increase the throughput of write-data transactions between replicator 30 and backup node 14 is to use asynchronous replication. Replicator 30, as noted above, can assign sequence numbers to dependent write-data transactions stored in log 32. Replicator 30 can transmit dependent write-data transactions to backup node 14 along with their respectively assigned sequence numbers. In the asynchronous mode as opposed to synchronous mode, there is no delay in transmission of a dependent write-data transaction until a previously transmitted dependent write-data transaction has completed at backup node 14. Backup node 14 stores received dependent write-data transactions along with their assigned sequence numbers, and these dependent write-data transactions are stored at backup node 14 until backup node 14 updates the primary data volume replica in accordance with these write-data transactions. Because dependent write-data transactions are transmitted with their respectively assigned sequence numbers, dependent write-data transactions can be transmitted to or received by backup node 14 out of order. Asynchronous mode is not defined by the order of transmission, asynchronous mode is defined by the fact that the write-data transaction is considered complete before it is known to have been transmitted and received. Because the write-data transaction is considered complete before it is transmitted and received, the source of the write-data transactions is free to issue subsequent write-data transactions, including dependent write-data transactions.
Asynchronous operation increases the transmission throughput of write-data transactions between replicator 30 and backup node 14 since there is no delay in transmission of a dependent write data instruction to backup node 14 until a previously transmitted dependent write-data transaction completes at backup node 14. Due to the asynchronous nature of transmission over link 16, write-data transactions may be received out of order by backup node 14. However, backup node 14 processes received write-data transactions according to their assigned sequence numbers to insure proper updating of the replica volume. In particular, the replica in storage system 22 is updated by write-data transactions with lower sequence numbers assigned thereto before being updated by write-data transactions with higher sequence numbers assigned thereto.
Businesses design their data processing systems to respond to data read and write requests from client computer systems within the shortest amount of time possible. The time it takes host node 12 to respond to data read and write requests is inversely proportional to the number such requests received in a given period of time. In other words, if host computer system 12 is burdened with an increasingly large number of data read and write requests, the response time to the requests may reduce substantially.
To increase the response time, businesses often add a second node to their data processing systems to service data read and write requests from client computer systems. FIG. 2 illustrates the data processing system 10 of FIG. 1 expanded to include a second node 40 coupled to primary data storage system 20. The data volume stored within primary data storage system 20 is shared by host node 12 and second node 40. Second node 40 receives data read and write requests from client computer systems coupled thereto.
In FIG. 2, host node 12 and second node 40 execute data storage management systems 42 and 44, respectively. Data storage management systems 42 and 44 are components of a distributed data storage management system. Data storage management system 44 takes form in software instructions executed on one or more processors of second node 40, while data storage management system 42 takes form in software instructions executed on one or more data processors of first node 12. Data storage management systems 42 and 44 generate write-data transactions in response to receiving and processing data write requests from client computer systems. It should be understood that the data processing system shown in FIG. 2 and its description should not be considered prior art to the invention described or claimed herein.
The replicated volume in data storage system 22 should be updated with data of write-data transactions generated by both data storage management systems 42 and 44. Second node 40 lacks a replicator. As such, write-data transactions generated by data storage management system 44 of second node 40, are provided to replicator 46 of host node 12. Replicator 46, like replicator 30 of FIG. 1, takes form in software instructions executing on the one or more processors of host node 12. Replicator 46 stores all write-data transactions, including those generated by data storage management systems 42 and 44, in write transaction log 32. Replicator 46 processes the write-data transactions stored in log 32 in much the same manner as replicator 30 described above. Replicator 46 can order dependent write-data transactions. In one embodiment, replicator 46 orders dependent write-data transactions by assigning sequence numbers thereto. In another embodiment, replicator 46 orders dependent write-data transactions by storing them in order in log 32 on a FIFO basis. Replicator 46, like replicator 30, is capable of transmitting write-data transactions either synchronously or asynchronously to the backup node 14 via link 16. In asynchronous mode, replicator 46 transmits dependent write-data transactions without the delay associated with synchronous transmission of dependent write-data transactions to back up node 14.
There is an increased data processing demand on host node 12 in FIG. 2 when compared to the data processing demand of host node 12 in FIG. 1 since host node 12 in FIG. 2 is required to process write-data transactions generated by data storage management system 44 of second node 40. This increased demand on host node 12 of FIG. 2 may cause host node 12, with its limited processing power, to act as a bottleneck to the data processing system 10 shown in FIG. 2. The bottleneck may negate the purpose of adding the second node 40.