1. Technical Field
Present invention embodiments relate to change data capture, and more specifically, to reducing re-reading of database logs by persisting long running transaction data.
2. Discussion of the Related Art
Change data capture (CDC) products read the log of a database (the source database or source) to determine what changes have been made to the database. Action can then be taken on the basis of these changes. In particular, the changes can be replicated in a copy of the database (the target database or target). Only committed transactions of the source database are replicated in the target. As log records are read, they are stored according to the transaction they are part of. Each transaction is stored in its own in-memory queue or list. When the commit log record for a transaction is seen, the transaction is applied to the target database. Thus, transactions are applied to the target in commit order, and the log position of the last commit applied indicates how far the replication has progressed. This is called the commit position.
From time to time, a CDC process will shut down. Later the process must resume in a manner that preserves data integrity. If the process is able to shut down gracefully, it can save the in-memory transaction queue data to disk before terminating. This information is restored at restart, and log reading can begin from the last log record previously read. However, if the process stops in a non-graceful manner, the transaction queue data will be lost, and some log data will have to be re-read in order to guarantee that all desired changes to the source are captured. If a non-graceful shutdown occurs while a long-running transaction is in progress, relevant entries may be far back in the log. As a result, a large amount of log data has to be reprocessed.