The present invention relates generally to database management systems having a primary database facility and a duplicate or backup database facility, and particularly to a system and method for synchronizing a backup database with a primary database while applications continue to actively modify the primary database.
The present invention is an improvement on the Tandem xe2x80x9cremote data facilityxe2x80x9d (RDF) technology disclosed in U.S. Pat. Nos. 5,799,322 and 5,799,323, both issued Aug. 25, 1998, which is hereby incorporated by reference as background information.
The prior art Tandem RDF technology underwent a number of changes over time to increase the bandwidth of the system, where the bandwidth is indicated by the peak number of transactions per second that can be performed on the primary system and replicated on the backup system. The present invention represents a set of new techniques so as achieve another large increase in bandwidth. Some of the techniques used by the present invention to increase bandwidth violate basic assumptions of the prior art systems, requiring both redesign of prior art mechanisms, and the some completely new mechanisms, to ensure that the backup system maintains xe2x80x9csoft synchronizationxe2x80x9d with the primary during normal operation, and to also ensure that the backup system can be brought to an entirely consistent internal state whenever the backup system needs to perform at takeover operation and be used as the primary system.
In summary, the present invention is a distributed computer database system having a local computer system and a remote computer system. The local computer system has a local database stored on local memory media, application programs that modify the local database, and a transaction manager that stores audit records in a local image trail reflecting those application program modifications to the local database as well as commit/abort records indicating which of the transactions making those database modifications committed and which aborted. Each audit record has an associated audit trail position in the local image trail, otherwise referred to as a MAT (master audit trail) position.
The remote computer system, remotely located from the local computer system, has a backup database stored on remote memory media associated with the remote computer system.
A remote duplicate data facility (RDF) is partially located in the local computer system and partially in the remote computer for maintaining virtual synchronization of the backup database with the local database. The RDF includes an Extractor process executed on the local computer system, and a Receiver process and one or more Updater processes executed on the remote computer system.
The Extractor process, executed on the local computer system, extracts audit records from the local image trail. It has a plurality of message buffers (four in the preferred embodiment) for buffering groups of the extracted audit records together and transmits each message buffer to the remote computer system when the buffer is full or a timeout occurs.
The Receiver process, executed on the remote computer system, receives message buffers transmitted by the Extractor process and distributes the audit records in each received message buffer to one or more image trails in the remote computer system. The audit records include audit update and audit backout records indicating database updates and database backouts generated by transactions executing on the primary system. The Receiver process stores the audit update records in one or more image trails, and stores each image trail in a sequence of image trail files.
For each image trail there is an Updater process that applies to a backup database volume the database updates and backouts indicated by the audit update and audit backout records in the image trail. The audit update and audit backout records are applied to the backup database volume in same order that they are stored in the image trail, without regard to whether corresponding transactions in the primary system committed or aborted.
Upon the occurrence of a predefined event, such as failure of the primary system, the backup system determines a set of primary system transactions for which a commit/abort outcome is unknown. For each image trail, the corresponding Updater completes applying database updates and backouts to the backup database volume. Then, the Updater backs out database updates for the transactions for which the commit/abort outcome has been determined to be unknown.
The remote computer system periodically executes a file purge procedure, which purges image trail files no longer needed by the remote computer system.