The present invention relates generally to database management systems having a primary database facility and a duplicate or backup database facility. More particularly, the present invention relates to system and method for keeping a backup database in synchronization with a primary database while applications continue to actively modify the primary database.
The present invention is an improvement on the xe2x80x9cremote data facilityxe2x80x9d (RDF) technology disclosed in U.S. Pat. Nos. 5,740,433, 5,745,753, 5,794,252, 5,799,322, 5,799,323, 5,835,915, and 5,884,328, all of which are hereby incorporated by reference as background information.
The prior art Tandem RDF technology underwent a number of changes over time to increase the peak number of transactions per second that can be performed on the primary system and replicated on the backup system. The present invention represents a set of new techniques so as to achieve a large increase in the rate at which transactions performed on the primary system can be replicated on the backup system. Some of the techniques used by the present invention violate basic assumptions of the prior art systems, requiring both redesign of prior art mechanisms and some completely new mechanisms, to ensure that the backup system maintains xe2x80x9csoft synchronizationxe2x80x9d with the primary system during normal operation, and to also ensure that the backup system can be brought to an entirely consistent internal state whenever the backup system needs to perform a takeover operation and be used as the primary system.
In summary, the present invention is a distributed computer database system having a local computer system and a remote computer system. The local computer system has a local database stored on local memory media, application programs that modify the local database, and a transaction manager that stores audit records in multiple local audit trails reflecting those application program modifications to the local database. The transaction manager stores in a particular one of the local audit trails transaction state records indicating the transaction states of the transactions making those database modifications. The valid transaction states of a transaction can be committed, aborted, active, aborting or prepared. The particular local audit trail is referred to as a MAT (master audit trail). The other local audit trails are referred to as AuxATs (auxiliary audit trails). The transaction manager also stores in the MAT a type of records known as Auxiliary Pointer Records, which indicate the range of audit records in the AuxATs that were flushed to disks since the last Auxiliary Pointer Record.
The remote computer system, remotely located from the local computer system, has a backup database stored on remote memory media associated with the remote computer system.
A remote duplicate data facility (RDF) is partially located in the local computer system and partially in the remote computer for maintaining virtual synchronization of the backup database with the local database. The RDF includes multiple Extractor processes that execute on the local computer system, and multiple Receiver processes and multiple Updater processes that execute on the remote computer system. When an RDF system is set up, each audit trail is configured to be associated with one Extractor process, and each Extractor process is configured to be associated with one Receiver process.
A Master Extractor process extracts audit records from the MAT, and each of the Auxiliary Extractor processes extracts auxiliary audit records from one of the AuxATs. The Extractor processes, when extracting audit records from the MAT and the AuxATs, insert an Audit Trail Position (ATPosn) value in each audit record. The Extractor processes then transmit the extracted audit records to the remote computer system.
The Receiver processes receive the extracted audit records from the Extractor processes and distribute the extracted audit records to one or more image trails in the remote computer system. The Master Receiver process receives audit records from the Master Extractor, and each of the Auxiliary Receiver processes receives audit records from an associated Auxiliary Extractor process. The audit records include audit update and audit backout records indicating database updates and database backouts generated by transactions executing on the local computer system. Control-type audit records, which only appear in the MAT, are distributed to a Master Image Trail (MIT). Data-type audit records of the MAT are distributed to MAT-based Secondary Image Trails (SITs). Audit records of the AuxATs are distributed to AuxAT-based SITs. Note that data-type audit records of the MAT or the AuxATs may be distributed to more than one SITs. Each Receiver process is also responsible of storing the ATPosn of the last audit record it received.
For each SIT there is an Updater process that applies to a backup database volume the database updates and backouts indicated by the audit update and audit backout records in the SIT. The audit update and audit backout records are applied to the backup database volume in same order that they are stored in the image trail, without regard to whether corresponding transactions in the local computer system committed or aborted.
Upon the occurrence of a predefined event, such as failure of the local computer system, the Receiver processes complete all processing of previously received audit records. The remote computer system then determines the transactions whose final commit/abort outcomes are unknown. The remote computer system also determines the transactions of which the completeness of their audit records is unknown. Thereafter, the Updater backs out the audit updates of the audit updates and backouts associated with the questionable transactions.
The remote computer system identifies the questionable transactions by examining the MIT and the audit records in the SITs. Specifically, the remote computer system first examines the Auxiliary Pointer Records and the transaction state records in the MIT. Based on information contained in the Auxiliary Pointer Records, transaction state records and the audit records in the SITs, the remote computer system identifies transactions having an unknown final state (e.g., committed or aborted) and/or transactions having a known final state but may be lacking a complete set of audit records. The Updaters then back out of the database updates associated with the identified transactions.