1. Technical Field of the Invention
The present invention generally relates to remote database synchronization. More particularly, the present invention is directed to a system and method for providing asynchronous incremental database update from a primary site to a remote recovery site, which completely decouples database updates at the primary site from the transmission of the database updates to the remote recovery site, thereby facilitating efficient data backup of business-critical data and disaster recovery thereof.
2. Description of the Prior Art
In the contemporary business environment, which is so heavily dependent upon relatively uninterrupted access to various kinds of information (i.e., business-critical data), disaster recovery is often of critical importance. Explosive growth in e-commerce and data warehousing has resulted in an exponential growth of data storage, which has ripened the need for disaster recovery. Disaster recovery schemes guard the business-critical data in an event that an entire system or even a primary site storing the business-critical data is destroyed, such as for example, by earthquakes, fires, explosions, hurricanes, and the like. System outages affecting availability of data may be financially devastating to businesses in a variety of business types. For example, brokerage firms and other financial institutions may lose millions of dollars per hour when the systems are down or destroyed. Ensuring uninterrupted access to the information and guaranteeing that business data are securely and remotely updated to avoid data loss in the event of an above-described disaster are critical for safeguarding the business-critical data and business operations.
Efficient disaster recovery requires that updates to business-critical data at a primary site be synchronized at a location that is remote to the primary site (i.e., remote recovery site) in order to ensure safety of and uninterrupted access to the business-critical data. However, if business-critical data at the remote recovery site is not kept current with the business-critical data at the primary site, any updates since a last periodic backup may be lost, thus significantly impacting business operations. Thus, a key feature of the efficient disaster recovery is the frequency of resynchronization of the business-critical data from the primary site to the remote recovery site.
Generally, resynchronization of data (i.e., database updates) at a remote site principally involves two techniques: synchronous and asynchronous. Variants of the two techniques are also possible. In the synchronous technique, application host writes by an application host are forwarded to the remote site as part of the input/output (i.e., “I/O”) command processing. Typically, the application host writes await remote confirmation before signaling I/O completion to the application host. There is a write latency associated with the synchronous technique because the application host awaits completion confirmation, which is further exacerbated by a physical separation of the primary site from the remote recovery site. Thus, the synchronous technique is invariably limited to relatively short distances because of the detrimental effect of a round-trip propagation delay on the I/O response completion signaling. Furthermore, until the I/O response completion signaling is received at the primary site, the application host is unable to access the data at the primary site. To the contrary of the synchronous technique, the asynchronous technique delivers application host writes over high-speed communication links to the remote recovery site while allowing the application host at the primary site to access the data. That is, the asynchronous technique signals I/O completion to the application host at the primary site before updating the remote recovery site. The asynchronous technique is often utilized when the distance between primary and the remote recovery sites (as well as possibly a relative low-bandwidth telecommunication link) would introduce prohibitive latencies if performed synchronously. However, it is clearly evident that a long-distance communication link may become a bottleneck that forces local I/O writes to be queued for transmission to the remote site. The queuing of I/O writes at the primary site negatively affects efficient disaster recovery since the queued I/O writes may be destroyed in an above-described disaster before they are transmitted to the remote recovery site.
The frequency for the resynchronization of the business-critical data from the primary site to the remote recovery site takes into account a space and a time dimension. The space dimension ultimately accounts for the amount of data, while the time dimension accounts for the time period when resynchronization occurs. A resynchronization that involves copying all of the data represents a full database backup, while an incremental database backup copies only a portion of the data that has changed since the last full or incremental database backup. Whether full or incremental, either backup method represents a time-consistent view of the data at the primary site. While individual host application I/O writes may be synchronously or asynchronously transmitted to the remote recovery site as they are made at the primary site, this fact presents a cost inefficiency in that the communication link between the primary site and the remote recovery site must be maintained (i.e., reserved or leased) to transfer the application host writes on a continuous basis.
A particularly useful resynchronization system is a Peer to Peer Remote Copy (i.e., “PPRC”) system offered by International Business Machines, Corporation (i.e., “IBM”), the assignee of the subject patent application. The PPRC provides synchronous copying of database updates from a primary Direct Access Storage Device (i.e., DASD) controller at a primary site to a remote DASD controller at the remote recovery site. That is, the PPRC system includes a primary controller and an associated primary DASD at the primary site and a remote controller and an associated DASD at the remote recovery site. Generally, each of the controllers includes a non-volatile storage (i.e., “NVS”) for maintaining data in the event of power or system failure. During resynchronization, the data is first written (or buffered) to the NVS of the primary controller at the primary site, the data is then transferred to the NVS in the remote controller at the remote recovery site. At later points in time, the data at the primary and remote NVS is destaged to the attached DASD storage devices (i.e., disk), i.e., the data is written from the NVS to the associated DASD storage device. It should be noted that a single DASD storage device may include more than one volume or a single volume may span more than one DASD storage devices. It should further be noted that with the PPRC system, the remote recovery site's DASD volume(s) are synchronously updated with data updates to the primary DASD volume(s).
One persistent problem with the PPRC system is that the volumes, which are synchronized between the primary and remote DASD storage devices, are unavailable for use while the PPRC data updates are serviced. The PPRC system does not consider the transfer of data to the remote recovery site complete until all the data updated at the DASD of the primary site has been updated at the DASD of the remote recovery site. Thus, data updates to the DASD of the primary site invariably delay response times to user requests to the volumes involved in the data updates because synchronous updates must be made to the DASD of the remote recovery site before the volumes involved in the updates are available to service the user requests. Response time delays may occur with respect to user requests to the DASD of the primary and remote recovery sites. Therefore, the user requests to volumes of either the primary or remote recovery site's DASD are subject to the data updates between the primary and the remote recovery sites and must therefore wait until the completion of the data updates before the requests can access the updated data.
Therefore there is a need in the art for providing a system and method that efficiently performs asynchronous incremental database updates from a primary site to a remote recovery site, thereby completely decoupling data updates at the primary site from the transmission of the data updates to the remote recovery site.